Friday, September 14, 2018

The Formula Of Data Revolution

Introduction

The data does not mean money. Data means an opportunity to earn money. Money does not mean data. Money means an opportunity to buy data.

Today there are many users and applications that produce a lot of data. We are going to call them "Data Producers". At the other extreme, there are some companies that consume data to improve their business, while many others are eager to gain access to these assets. Let's call them "Data Consumers". In the middle, a large group of people extract value from raw and meaningless data. We call them 'data scientists'.

Some statistics about the data ecosystem today:

• 2.5B active smartphone users

• 5 million mobile applications available

• 6.9M machine learning developers and data scientists

• The US $ 250 billion spent on data per year

It looks promising and it seems that this is a solid industry, but ...

• Users have no control over their data, including their private information

• Apps do not use data to earn money (other than the "luck" of 1%)

• Data scientists do not have access to open and diversified datasets

• Only a group of large companies (with huge portfolios) has access to this valuable resource

Data scientists and data consumers must be enabled to be creative in what data they collect and what they can be used for. And today it is up to the great giants to decide what is collected and not.

We are at the beginning of the Data Revolution, and we have a great opportunity to do the right thing - connect players and free up resources to create a better world ... instead of leaving this huge business in the hands of a few selected companies to control everything, from the user's privacy to the developer's faith.

Application use

Most application users have access to 20 different applications in a month and spend more than 2 hours a day using them.

This means billions of hours invested in digital media, hence trillions of bytes generated by these applications. But that's just the beginning. The massive integration of intelligent objects (also known as "Internet of things") in our daily lives will increase the use of applications and, therefore, will generate bytes considerably.

As you can see, the amount of data is growing at a tremendous rate, but we also need to talk about which data points are being collected and what the consequences of an inclined dataset might be.

The problem of data distortion

There have been several publicized cases of late in which Artificial Intelligence systems made foolish mistakes that greatly affected public opinion, bringing media attention to issues such as the AI preconception problem.

But the problem is not the intelligence of this sophisticated software, but the data used to feed them. This raw data is the mechanism by which learning is achieved and conclusions about interests, behaviours and trends are drawn.

It is known that the more data available to train a Machine Learning model, the better. With more data, greater accuracy can be achieved and anomalies can be separated from the "reality" described. But precision does not mean objectivity. The bot Tay of Microsoft learned very quickly, accurately and accurately, that "Ricky Gervais learned totalitarianism with Adolf Hitler, the inventor of atheism."

The problem was not Tay and his A.I. implementation, but the data is fed into it. When working with data, especially in deep learning models, you can have a large data set that is essentially flawed because the data is incomplete, does not include data about certain groups, intentionally reinforces stereotypes or contains other problems that cause the model to acquire the "wrong" knowledge.

Data Brokerage Business

Data Brokers are companies that collect, analyze, and sell information about users. This may include personal information, demographic information, calculated behaviours, interests, and purchase intentions. These companies are middlemen who buy your data from tracking companies, search marketing professionals and retailers, apply some profile information; and then sell the results to advertisers, insurers, banks or other entities interested in knowing more about you - or even trying to manipulate you.

If you as a freelancer, app developer, publisher, or small business want to acquire user data, say to learn more about your app visitors and improve the user experience, you'll only have the option to purchase demographic and static site data such as Pipl, DataFinder, TowerData or ClearBit among others. Because if you want to access Acxiom, Lotame, Nielsen, Oracle Cloud Data or Equifax ... good luck with that! They are too busy making $ 250 billion a year to service their call.

Active Personal Data

Your personal information has value Period. Companies pay you to address you, or your "like", with specific advertising. Others are paying to know more about their purchase history to estimate the risk of offering products such as credit cards.
If you want to have a rough idea of what your data is worth, try this calculator created by the Financial Times.

Already in 2011, the World Economic Forum declared personal data as a class of new assets, describing them as "a valuable resource for the 21st century that will touch all aspects of society". And they put together this great chart to show the economic value chain: 

     
There are many companies in each of these boxes that have created a rich ecosystem to produce and capitalize on your personal data. Let's agree with The Economist and call your data 'the new oil'. So, we should say that there is an entire industry refining, separating, converting and treating its crude oil to sell it as fuel. But, of course, in this case, you do not get anything for being the raw supply.

The formula

So we want people to control the data they generate. We want the owners of the applications to earn money with their applications without compromising the user's experience or privacy. We need Data Scientists to work with the data assets that are extracted from the applications, and we need transparent and fair access for data consumers to those raw or processed assets.

But we also need to see this flow: (Users-> Applications-> DS-> DC) inverted (DC-> DS-> Applications-> Users). What happens if a data consumer or data scientist can send a request to collect or generate a specific type of application / user data? ...
What happens if we slow down this concentrated data brokerage business and allow anyone to buy and request data publicly? What happens if that request reaches the points of view of millions of owners and publishers of applications and works with its users to co-produce the requested data?


By establishing a direct connection between application developers and Data Consumers (and their scientists) you can unleash the creativity in data collection, both from application developers ("wait a minute, I can also trace this!") of data consumers ("If only I could find this subset of users ...") 

To getting expert-level training for Data Science Training in your location –Data Science Training in Chennai Data Science Training in Bangalore Data Science Training in Pune | Data Science Training in Kalyan nagar Data Science Training in marathahalliData Science Training in Anna Nagar | Data Science Training in OMR Data Science Training in rajaji nagar Data Science Training in btmData Science with Python training in Chennai | Data Science with R Training in ChennaiData Science with SAS Training in ChennaiData science Training in velachery | Data science Training in tambaram | Data Science Training in PuneData Science Training in Kalyan nagar | Data Science Training in Chennai |Data Science training in Bangalore | Data Science training in Chennai| For getting online training | Data Science online Training

1 comment:

Merits & Demerits of Data Analytics

Definition:  The data analysis process was concluded with the conclusions and/or data obtained from the data analysis. Analysis data show...