As you are probably aware, the total amount of data in the world is exploding. The advisory and market intelligence organization IDC estimates that in 2018 we produced 33 ZB (zettabytes, i.e. billions of terabytes) of new data and that this number is going to grow to 175 ZB by 2025.
If you are interested in detailed numbers for popular internet services (“Instagram users post 49,380 photos every minute”), check the report called Data Never Sleeps 6.0, crafted by DOMO.
But what implications does this “Big Data flood” have at the individual level? For the average person who probably isn’t in the “data industry”?
I don’t think anyone can say for sure. There are some approximate historical parallels, but for all practical purposes this is the first time we’ve been here as a society. What I can say is that everyone should be paying close attention to three key issues in the “Big Data floodplain” – artificial intelligence, automation, and privacy – that are already making everyday life look significantly different than a few years ago.
Let’s start with some background. Or maybe a catchy metaphor?
It’s not 100% sure but most probably it was Clive Humby who used the phrase “data is the new oil” for the first time, back in 2006. The catchy phrase became quite popular, partly because the metaphor behind it gives quite a lot of room for interpretation.
Consider, for example, how an entire industry has emerged around the collection, storage, and processing of data – just as how a similar business ecosystem exists around oil. The “data industry” includes the world’s five biggest companies by market capitalization: Apple, Alphabet (Google’s parent company), Microsoft, Facebook, and Amazon.
It might be a stretch to say that “Apple is worth more than Poland,” because it makes little sense to compare a company’s market capitalization ($921.13B as of April 3rd, 2019) with a country’s GDP ($581B or $1.271T considering the Purchasing Power Parity). Still, the reality is that no country would realistically be able to buy any of these five companies even it was technically feasible to do so. And no country is in fact able to exercise control over the biggest tech giants.
That fact alone should indicate how data is significantly shaping how the world works in terms of not only the global economy but also the daily life of millions of people.
The rise of the data industry and its effects therefore provokes a lot of questions such as how to define a monopoly, how to measure the power and influence of global companies, how to ensure equilibrium between them and national authorities, and how to protect consumers against abuse. Data-related products and services are often technically complex to the point that nobody can predict what risks and problems they may bring.
The analogy to oil doesn’t end with the fact that data is a resource that shapes the world and creates many potential problems. Just as fuel powers an engine, data powers a big data platform, a cluster of multiple computers working together.
The terms data science and big data platform are somewhat blurry but it doesn’t make much sense to argue about definitions. The thing to know is that an enormous amount of low-hassle processing power is available to companies even of moderate size, mostly thanks to large public clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The amount of data available for analysis is unprecedented as well.
This combination of data and processing power delivers almost endless possibilities, some of the most impactful of which involve artificial intelligence.
However, like oil, data has to be processed and refined in order to be valuable. Most of the data that is collected and further used today at the very beginning has limited structure and is full of garbage and errors.
Long story short, data by itself is basically useless in raw form. To “distill” the information from it we need to apply some “chemistry” and “materials engineering.” This is the moment when the mysterious term data science enters the stage in real life. Even though data science is a broad and complex discipline requiring significant knowledge, the reality is that huge amounts of time spent by data scientists at work is devoted to cleaning and preparing data for further analysis.
But when the data is clean and ready, some pretty clever methods can be used to make sense of it. You have probably already heard of Artificial Intelligence. Is it as complex and close to human intelligence as it seems?
Today’s systems falling under the umbrella term artificial intelligence (AI) are oftentimes based on how the human brain functions and can solve some surprisingly complex problems. Even so, they are pretty far from actually being intelligent in the human sense.
For the most part, they learn in a narrow, particular way that essentially involves an algorithm functioning in a way not strictly predetermined by the programmer. The programmer (in the AI domain usually preferring to be called a data scientist) specifies a model which has some changeable moving parts, i.e. parameters. The model conforms to the expected result by minimizing a predefined error or maximizing a predefined success rate, either based on tagged examples in the historical data (in the case of so-called supervised learning) or by looking for not-yet-defined patterns in it (unsupervised learning). One system usually solves one problem, adjusting to the statistical relationships that can be found in the training data.
But don’t get discouraged, data enthusiast! Even relatively straightforward or narrowly targeted AI algorithms can bring tremendous value to the business, simply by generalizing and combining multiple experts’ knowledge. For sake of simplicity, I am also not touching on the very trendy topic of deep learning, a group of AI methods in which the “primitive” systems described above get pretty complex, mostly by assembling composite models using more primitive ones for specific task subproblems, ability to change their own structure and/or letting the algorithm figure out the necessary input from the raw data on its own. This is enough for machines to outperform humans and step into their shoes to complete a growing range of tasks even in fields we previously thought were reserved for people, such as visual arts, music, playing complex games and object recognition from photographs. These advances are both fascinating and terrifying, as for some they may suggest a future with mass unemployment, especially considering how dynamically AI is developing.
However, to be fair, we are still far from creating what is known as Strong AI or Artificial General Intelligence (AGI), a system that has the intellectual and cognitive abilities of a human in all respects. A system would be able to correct and enhance itself to create even better AI, which would repeat the process to start a technical development snowball impossible to control or predict.
This phenomenon is known as singularity and is popular in science fiction. Achieving singularity – or something that would lead to it – has been predicted a couple of times already. In 1965, Herbert A. Simon, an AI pioneer, supposedly wrote that “Machines will be capable, within twenty years, of doing any work a man can do.” Today’s specialists in the field do not agree if AGI is actually possible, when it could happen, or even on criteria to define Strong AI.
More often than not, you can encounter anxiety or even alarm regarding AI-based systems getting out of control – even without its being self-aware or completely self-reliant – thanks to design flaws caused by human error or being used with bad intentions. This includes vulnerability to hacking, caused either by lack of security-consciousness among AI systems designers or simply growing business impact of AI that makes it more and more attractive as an attack goal for the black hats.
If you are interested in getting some deeper insight into the problems mentioned above – namely: the dangers of AI taking over the world, effects of AI on job market and the privacy implications of the data-driven systems that we all use, stay tuned, as we’ll have dedicated publications on each of these topics for you. lsd
Lingaro is a professional services company specializing in analytics, artificial intelligence and e-commerce. We drive digital innovation and digitally-enabled operations for Fortune Global 500 enterprises. We are a fast-growing team with the can-do culture of a successful startup and the reliable, scalable delivery capabilities that leading organizations demand of their technology partners.
Dominik Maj is an IT consultant and software developer focused on data-processing systems currently working in Lingaro Big Data team. After hours he’s interested more in the social and cultural side of things, especially music.