Introduction
I have been interested in some time in how data scientists can better communicate data analysis activities with each other and with people outside the field. I think our current methods are inadequate because they were mostly taken from other areas (in particular, computer science). Many of these tools are useful but are not designed to specifically communicate data analysis concepts and often fall short of expectations. I talked about this problem at my Dean Conference earlier this year and how the field of information science could benefit from developing their own theories, to simplify communication as other fields did.
One thing I have noticed is that in other fields, the development of these fields can be seen in part as a trend towards increased specialization. With people in a field who increasingly specialize in a sub-speciality, there is a parallel need for experts to communicate and coordinate with each other to produce a complete product. Over time, separating a field from a collection of experts drives the development of communication tools that can serve as mutually agreed centres of information exchange. Without proper tools, the communication overhead involved in adding more people to one project will be too large and the whole company can collapse.
I thought it might be helpful to talk about some of these other areas and how they overcame the increased specialization and separation of tasks with communication tools. Tracking the history of other fields is instructive as it provides a basis on which we can analyse data analysis. The listeners of my podcast with Hilary Parker know that we regularly have a segment that we refer to as "Singing the Analogy" and this is the Simply Statistics version.
Specialization in other areas.
The first example comes from the cinema and script development. The Script Lab describes the story of the script and how the movie worked before the script was developed:
In contemplating the story of script writing, you cannot separate the theories of scriptwriting from the evolution of film production. The first films were often solo projects, from conception to completion. Known as the "camera system," this was the most primitive of cinematography. Soon, the directors became instrumental in the process, but most of the films were filmed with only a vague idea of what the director wanted to film. Often, the teams waited while the director planned what to shoot next.
The films were one-man projects and developed more or less linearly. It was an inefficient system: most of today's films are produced non-linearly to accommodate actors' schedules and various production processes.
Today, the screenplay serves as a critical communication centre around which many film departments (costumes, makeup, hair, accessories, ensembles) can organize their activity. Imagine if the representatives of each of these departments had to consult individually with the writer or director on every detail of their work. It would be a nightmare of increasing exponential complexity. With a written document, such as a script, in which everyone can agree as for the authority of "what's happening in the movie," people can do their job without the need for constant communication 24 hours a day.
The second analogy comes from finance. In finance, there was a similar development of expertise in relation to limited liability. Here, "specialization" refers to the separation of a company's owners from its managers. As a result, there must be a way for company managers to communicate to investors what exactly is going on with the company's operations. Therefore, the development of financial statements, accounting rules and various publicly available documents that allow investors to analyse the health of a company. The seminal safety analysis of Graham and Dodd is essentially a request for investors to evaluate companies based on publicly available data, rather than common myths and legends about what contributes to a good or secure investment. Today, with the separation of managers' owners and the creation of standardized communication formats between the two (eg S-1, 10-K, 10-Q, etc.), we have the basis of the global capital market system.
The last analogy comes from Western classical music, where there is often a division between the composer and the performer. In the more complex symphonic music, it can be said that there are three roles: the composer, the interpreter and the interpreter/director. However, in early classical music, such a division did not exist and composers often played their own songs, often alone. In this configuration, there is not much need to write things, since music can be stored and played from the head of the composer. This concept was well captured in the movie Amadeus, where Mozart describes his opera The Magic Flute as "up here in my pasta" (the rest is just doodles and bibbling).
Of course, opera could be the last example of classical music, where some kind of communication tool is needed to coordinate musicians, singers and set designers. So for most classical music, we have the score, which specifies what each instrument and the signer is doing at a given time. There is a standardized notation that allows others unfamiliar with the composer to quickly understand what is happening and to collect the time and resources needed to complete the work.
What about data analysis?
In today's data science, or really in science, much of what happens follows the "vertical integration" model where the same person asks the question, collects data and analyzes. The need for communication methods does not really arise until the work should be spread to others (including oneself in the future). In large collaborations where the communication under analysis should be done from the beginning, my experience has been that even in the best scenarios, the methodology is ad hoc and difficult to recreate in another project involving different people.
Most agree that the software code that actually performs the analysis is an important component of communicating what is being done. However, not everyone needs or wants all the details provided by the code. Perhaps a concept that could steal music is the distinction between punctuation and parts. In a symphony, the driver needs full punctuation because you need to know what everyone is doing at all times. But the first violinist just read the first part of a violin, there is a need to read the score to play a vital role in the creation of the final product.
Development of appropriate communication tools for information science is critical to large-scale data analysis so that more people can participate and reproducibility/repeatability so that more people can understand what happens in an analysis. Until then, I think we're going to keep tapping tools from other fields in the data science process, and that's fine. These tools are useful, but I think, ultimately, they are not a perfect fit.
To getting expert-level training for Data Science Training in your location –Data Science Training in Chennai | Data Science Training in Bangalore | Data Science Training in Pune | Data Science Training in Kalyan Nagar | Data Science Training in marathahalli| Data Science Training in Anna Nagar | Data Science Training in OMR | Data Science Training in Rajaji Nagar | Data Science Training in btm| Data Science with Python training in Chennai | Data Science with R Training in chennai| Data Science with SAS Training in Chennai| Data science Training in Velachery | Data science Training in Tambaram | Data science training in jayanagar| Data Science Training in Pune| Data Science Training in Kalyan Nagar | Data Science Training in Chennai |Data Science training in Bangalore | Data Science training in Chennai|Data Science Training in electronic city|Data Science Training in Indira Nagar| Data Science Training in Marathahalli | Data Science Training in BTM layout For getting online training | Data Science online Training |Data science training in USA
To getting expert-level training for Data Science Training in your location –Data Science Training in Chennai | Data Science Training in Bangalore | Data Science Training in Pune | Data Science Training in Kalyan Nagar | Data Science Training in marathahalli| Data Science Training in Anna Nagar | Data Science Training in OMR | Data Science Training in Rajaji Nagar | Data Science Training in btm| Data Science with Python training in Chennai | Data Science with R Training in chennai| Data Science with SAS Training in Chennai| Data science Training in Velachery | Data science Training in Tambaram | Data science training in jayanagar| Data Science Training in Pune| Data Science Training in Kalyan Nagar | Data Science Training in Chennai |Data Science training in Bangalore | Data Science training in Chennai|Data Science Training in electronic city|Data Science Training in Indira Nagar| Data Science Training in Marathahalli | Data Science Training in BTM layout For getting online training | Data Science online Training |Data science training in USA
ReplyDeleteAnd indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.
Jmeter training in chennai | best jmeter training in chennai | jmeter training institute in chennai
php training institute in chennai | php training in chennai |best php training institute in chennai |php course in chennai |best php training in chennai
angularjs training in chennai | angularjs course in chennai |angularjs training institute in chennai |angularjs training institutes in chennai
Best selenium training institute in chennai | selenium course in chennai | selenium training institute in chennai | selenium classes in chennai | selenium testing training in chennai |selenium certification in chennai
Appreciation for really being thoughtful and also for informative guidelines delivered.Cloud Computing Training in Chennai
ReplyDeleteSoftware Testing Training in Chennai
This is an awesome post.
ReplyDeleteReally very informative and creative contents. These concept is a good way to enhance the knowledge.
I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
Python Training in Chennai | Certification | Online Course Training | Python Training in Bangalore | Certification | Online Course Training | Python Training in Hyderabad | Certification | Online Course Training | Python Training in Coimbatore | Certification | Online Course Training | Python Training in Online | Certification | Online Course Training
This comment has been removed by the author.
ReplyDeleteIt's true: Amazon has been profitable for nearly two years, even without AWS cloud windfall. keep share some more details..
ReplyDeleteAWS training in Chennai
AWS Online Training in Chennai
AWS training in Bangalore
AWS training in Hyderabad
AWS training in Coimbatore
AWS training
It is really a great work and the way in which u r sharing the knowledge is excellent.
ReplyDeleteThanks for helping me to understand basic concepts.
Java training in Chennai
Java Online training in Chennai
Java Course in Chennai
Best JAVA Training Institutes in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Training
Java Online Training