Datascience is also known as data-driven science, the use of scientific methods,
processes, and systems to extract knowledge or information from data in a
variety of ways, ie structured or unstructured. Machine learning is a type of
artificial intelligence system that makes the computer capable of learning
alone, that is, without being programmed explicitly.
Why
seek a deeper understanding of scientific data by studying a special, or simply
want to get an overview of the smart field, mastering the correct term would
quickly track your success in travel and career professionals. the education
you.
1.
Business Intelligence (BI). BI is the process of analyzing and
generating historical data reports to guide future decision making. BI helps
leaders make better strategic decisions to move forward with determining what
has happened in the past using data such as sales statistics and operational
metrics.
2.
Engineering data. Data
engineers build infrastructure through which data is collected, cleaned, stored
and prepared for the use of data by scientists. Good engineer is invaluable,
and building a team of scientific data without them is the "carriage in
front of the horse" approach.
3.
Scientific decision. In
the context of scientific data, scientists' decisions apply mathematics and
technology to solve business problems and increase behavioral science and
design thinking (a process that aims to better understand the end user).
4.
Artificial Intelligence (AI). The computer's artificial intelligence
system can perform tasks that normally require human intelligence. This does
not necessarily mean that the replication of the human mind, but implies the
use of human reason as a model to provide good service or improve the product,
such as voice recognition, decision making and translated language.
5.
Learning Machine. A
subset of AI, the learning machine, refers to the process by which a system
learns from the entered data, identifying patterns in the data that are then
applied to the new problems of people or requests. This allows scientists to
teach the computer to perform tasks, rather than scheduling them to perform
each task step by step. We will use, for example, to understand consumer
preferences and buying patterns to recommend products on Amazon or search
curricula to identify candidates for the highest potential job based on
keywords and phrases.
6.
Supervised learning. This special type of machine learning consists of
scientific data that act as guides to teach the desired conclusion to the
algorithm. For example, the computer learns to identify the animals being
trained in the image data correctly defined by each species and the
characteristics of the animal.
7.
Classification example
of a supervised learning algorithm places a piece of new data in a pre-existing
category, according to the layout of the category already known. For example,
you can use it to determine if you are likely to spend more than $ 20 online
for a customer, depending on the similarity with other customers who have
surpassed the previous number.
8.
The cross-validation method is to validate automatic learning models
of stability or accuracy. Although there is a wide variety of cross-validation,
one of the most basic is to divide a series of exercises into your training and
two algorithms are in one part before applying it to the second part. Because
you know what the result should receive, you can evaluate the validity of our
model.
9.
Grouping is
the classification, but without the appearance of the area. With the clustering
algorithm that receives the inserted data and finds similarities in the data of
the same, categorizing the data points to agree.
10.
Deep learning. A
more advanced form of machine learning, system learning and refers to multiple
input / output layers, as opposed to the surface layer of the input / output.
In the study, there are several input / output rounds required to help the team
solve complex real-world data problems. A deep dive can be found here.
11.
linear regression. of
linear regression in the relation between two variables, adjusting a linear
equation to the observed data. In doing so, an unknown variable can be
predicted from variables related to known. A simple example is the relationship
between height and weight of an individual.
12.
A / B tests Commonly
used in product development, testing, A / B is a randomized test in which to
test two variants to determine the best course of action. For example, as it is
known, it tried several shades of Google's blue to determine which tone got the
most clicks.
13.
Hypothesis testing. Test the hypothesis that the use of statistics to
determine the probability that the null hypothesis is correct. It is often used
in clinical research.
14.
Statistical power. Statistical power is likely to make a decision about
the right to reject the null hypothesis when the null hypothesis is false. In
other words, the probability of the study detecting whether there is an
influence effect can be detected. A high statistical power means less likely to
erroneously conclude that they have a variable effect.
15.
Standard errors. The standard error is a measure of the accuracy of the
statistical estimate. A large sample size of standard error decreases.
16.
Causal inference is
the process of verifying that there is a relationship between cause and effect
in a given situation, the purpose of analyzing many data in the social and
health sciences. They generally require not only good data and algorithms but
also experience in the subject.
17. The
exploratory data analysis(EDA). EDA is often the first step in analyzing
datasets. With EDA techniques, data scientists can summarize the key features
of a dataset and report on the development of a more complex model or the next
logical step.
18.
The data view. A
key component of scientific data, data visualizations that are visual
representations of text-based information, and better detection and recognition
of patterns, trends and correlations. This helps people understand the meaning
of the data by putting it in a visual context.
19.
R. R
is a programming language and software environment for statistical computation.
R language widely used between statistics and data mining for software
development and statistical analysis of data.
20.
Python is
a programming language for the general purpose programming language and is used
to manipulate and store data. Many websites are subject to traffic, like
YouTube, which is done using Python.
21.
SQL. Structured
query language or SQL is another programming language used to perform tasks
such as updating or retrieving data to a database.
22.
ETL. ETL
is the type of data integration that refers to three stages (extraction,
transformation, loading) to mix data from multiple sources. It is often used to
build a data warehouse. An important aspect of data storage is that it
consolidates data from multiple sources and transforms it into a common utility
format. For example, ETL normalizes data from various departments and business
processes to be consistent and standardized.
23.
GitHub. GitHub
is a code for exchanging and publishing services, as well as a community of
developers. It provides access control and many collaboration functions, such
as error tracing, resource requests, task management, and wikis for each
project. GitHub offers both repositories and private accounts are free, usually
used to host the open source project software.
24. Data
Models define how datasets are connected to each other and how they
are processed and stored inside a system. The data model that shows the structure of
a database, even in relation to the limitations of scientific data, helps to
understand how data can be stored and manipulated better.
25.
Data warehouse. A
data warehouse is a repository in which all data collected by an organization
is stored in use as a guide to make management decisions.