Thursday, October 25, 2018

Simple Guidance for you in Specialization and Communication in Data Science



Introduction

I have been interested in some time in how data scientists can better communicate data analysis activities with each other and with people outside the field. I think our current methods are inadequate because they were mostly taken from other areas (in particular, computer science). Many of these tools are useful but are not designed to specifically communicate data analysis concepts and often fall short of expectations. I talked about this problem at my Dean Conference earlier this year and how the field of information science could benefit from developing their own theories, to simplify communication as other fields did.

One thing I have noticed is that in other fields, the development of these fields can be seen in part as a trend towards increased specialization. With people in a field who increasingly specialize in a sub-speciality, there is a parallel need for experts to communicate and coordinate with each other to produce a complete product. Over time, separating a field from a collection of experts drives the development of communication tools that can serve as mutually agreed centres of information exchange. Without proper tools, the communication overhead involved in adding more people to one project will be too large and the whole company can collapse.

I thought it might be helpful to talk about some of these other areas and how they overcame the increased specialization and separation of tasks with communication tools. Tracking the history of other fields is instructive as it provides a basis on which we can analyse data analysis. The listeners of my podcast with Hilary Parker know that we regularly have a segment that we refer to as "Singing the Analogy" and this is the Simply Statistics version.

Specialization in other areas.

The first example comes from the cinema and script development. The Script Lab describes the story of the script and how the movie worked before the script was developed:

In contemplating the story of script writing, you cannot separate the theories of scriptwriting from the evolution of film production. The first films were often solo projects, from conception to completion. Known as the "camera system," this was the most primitive of cinematography. Soon, the directors became instrumental in the process, but most of the films were filmed with only a vague idea of what the director wanted to film. Often, the teams waited while the director planned what to shoot next.

The films were one-man projects and developed more or less linearly. It was an inefficient system: most of today's films are produced non-linearly to accommodate actors' schedules and various production processes.

Today, the screenplay serves as a critical communication centre around which many film departments (costumes, makeup, hair, accessories, ensembles) can organize their activity. Imagine if the representatives of each of these departments had to consult individually with the writer or director on every detail of their work. It would be a nightmare of increasing exponential complexity. With a written document, such as a script, in which everyone can agree as for the authority of "what's happening in the movie," people can do their job without the need for constant communication 24 hours a day.

The second analogy comes from finance. In finance, there was a similar development of expertise in relation to limited liability. Here, "specialization" refers to the separation of a company's owners from its managers. As a result, there must be a way for company managers to communicate to investors what exactly is going on with the company's operations. Therefore, the development of financial statements, accounting rules and various publicly available documents that allow investors to analyse the health of a company. The seminal safety analysis of Graham and Dodd is essentially a request for investors to evaluate companies based on publicly available data, rather than common myths and legends about what contributes to a good or secure investment. Today, with the separation of managers' owners and the creation of standardized communication formats between the two (eg S-1, 10-K, 10-Q, etc.), we have the basis of the global capital market system.

The last analogy comes from Western classical music, where there is often a division between the composer and the performer. In the more complex symphonic music, it can be said that there are three roles: the composer, the interpreter and the interpreter/director. However, in early classical music, such a division did not exist and composers often played their own songs, often alone. In this configuration, there is not much need to write things, since music can be stored and played from the head of the composer. This concept was well captured in the movie Amadeus, where Mozart describes his opera The Magic Flute as "up here in my pasta" (the rest is just doodles and bibbling).

Of course, opera could be the last example of classical music, where some kind of communication tool is needed to coordinate musicians, singers and set designers. So for most classical music, we have the score, which specifies what each instrument and the signer is doing at a given time. There is a standardized notation that allows others unfamiliar with the composer to quickly understand what is happening and to collect the time and resources needed to complete the work.

What about data analysis?

In today's data science, or really in science, much of what happens follows the "vertical integration" model where the same person asks the question, collects data and analyzes. The need for communication methods does not really arise until the work should be spread to others (including oneself in the future). In large collaborations where the communication under analysis should be done from the beginning, my experience has been that even in the best scenarios, the methodology is ad hoc and difficult to recreate in another project involving different people.

Most agree that the software code that actually performs the analysis is an important component of communicating what is being done. However, not everyone needs or wants all the details provided by the code. Perhaps a concept that could steal music is the distinction between punctuation and parts. In a symphony, the driver needs full punctuation because you need to know what everyone is doing at all times. But the first violinist just read the first part of a violin, there is a need to read the score to play a vital role in the creation of the final product.

Development of appropriate communication tools for information science is critical to large-scale data analysis so that more people can participate and reproducibility/repeatability so that more people can understand what happens in an analysis. Until then, I think we're going to keep tapping tools from other fields in the data science process, and that's fine. These tools are useful, but I think, ultimately, they are not a perfect fit.

To getting expert-level training for Data Science Training in your location –Data Science Training in Chennai Data Science Training in Bangalore Data Science Training in Pune | Data Science Training in Kalyan Nagar Data Science Training in marathahalliData Science Training in Anna Nagar | Data Science Training in OMR Data Science Training in Rajaji Nagar Data Science Training in btmData Science with Python training in Chennai | Data Science with R Training in chennaiData Science with SAS Training in ChennaiData science Training in Velachery | Data science Training in Tambaram | Data science training in jayanagarData Science Training in PuneData Science Training in Kalyan Nagar | Data Science Training in Chennai |Data Science training in Bangalore | Data Science training in Chennai|Data Science Training in electronic city|Data Science Training in Indira NagarData Science Training in Marathahalli Data Science Training in BTM layout For getting online training | Data Science online Training |Data science training in USA

6 comments:

  1. Appreciation for really being thoughtful and also for informative guidelines delivered.Cloud Computing Training in Chennai
    Software Testing Training in Chennai

    ReplyDelete
  2. This is an awesome post.
    Really very informative and creative contents. These concept is a good way to enhance the knowledge.
    I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Python Training in Chennai | Certification | Online Course Training | Python Training in Bangalore | Certification | Online Course Training | Python Training in Hyderabad | Certification | Online Course Training | Python Training in Coimbatore | Certification | Online Course Training | Python Training in Online | Certification | Online Course Training

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete

Merits & Demerits of Data Analytics

Definition:  The data analysis process was concluded with the conclusions and/or data obtained from the data analysis. Analysis data show...