Skip to main content

02 What is Big Data and new ways to solve problems

Definition of the four Vs

  1. Volume: (Data at scale,scale): terabytes to hexabyte of data cumulated on cheaper and cheaper storages
  2. Variety: (Data in many forms:forms): structured, unstructured, text, images, video and general multimedia
  3. Velocity: (Data in motion,motion): straming data analitics
  4. Veracity: (Data uncertainty,uncertainty): managing the reliability and predictability of inherently imprecise data type

Other Vs: Value, Volatili

"oil" metaphor for Big Data, Data Science and Data Engineering

Data is just like crude oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so data must be broken down and analyzed for it to have value.

  • Exploration: just like we need to find oil, we need to locate relevant data before we can extract it
  • Extraction: after locating the data (oil), we need to extract it
  • Transform: we then need to clean, filter and aggregate data (oil)
  • Storage: data (oil) needs to be stored and this may be challenging if it is huge
  • Transport: getting the data (oil) to the right person, organization or software tool (to the petrol station)
  • Usage: while driving a car one consumes oil. Similarly, providing analysis results requires data

However, there are some important differences between data and oil:

  • Copying data is relatively easy and cheap. While it is impossible to simply copy a product like oil.
  • Data is specific, i.e., it relates to a specific event, object, and/or period. Different data elements are not exchangeable. When going to a petrol station, this is very different; drops of oil are not preallocated to a specific car on a specific day.
  • Typically, data storage and transport are cheap (unless the data is really Big Data). In a communication network, data may travel (almost) at the speed of light and storage costs are much lower than the storage costs of oil.

New ways to solve problems

Data driven value