Friday, August 24, 2018

Introduction to K-Means Clustering in Data science

Introduction

The K-K form is a type of unauthorized learning that is used to describe the data (i.e. lack of information about categories or groups). The purpose of this deployment is to obtain information groups with the fact that the number of K agents representing the variable is assigned to assign the data point to each group K as given attributes.

Data points are divided into different versions. K-results mean that the clustering algorithm:
1. K, which can be used to mark new information
2. Training marks (each data point was assigned to one group)
Instead of identifying groups before you preview them, it will allow you to search for and analyzes identified groups. The "Select K" section below describes how many groups can be identified.

Each category of groups is a set of behavioral values that define groups. The middle-value test can be used to describe the type of group that represents each group.

Introduction K-means presents the algorithm: K is a typical business examples

The steps required to implement the algorithm

For example, Python uses traffic information

Business

The integrated K tool is used to search for groups that are not clearly defined in the data. This can be used to check business ideas about group types or to identify unmanaged groups in complex data. When the algorithm is implemented and determined by groups, all new information can be easily broken into the correct group.
This is an algorithm that can be used for any type of group. Examples of some examples are:

Characteristics of nature:

1. Part of buying history
2. Part of apps, pages, or program apps
3. Define people by interests
4. Create a type of activity based on movement

Distribution list:
·        Team sales team
·        Number of groups produced by measuring the product
·        Measurement layout:
·        Displays types of motion wave sensors
·        Team photos
·        Sound of sound
·        Identify health monitoring groups

Find mail or anomalies:
Separate groups from active groups
Cleaning the group by cleaning the alert
In addition, watch the data that is between the groups, which you can later use to identify important data changes.

Algorithm
The algorithm combining the algorithm uses the model to achieve the final result. The data algorithm is the number of KCC packages and data. Data is a collection of data characteristics. Algorithms begin with early centroid K, which can be randomly selected or randomly selected. It then does two steps:

Step 1:

Each of the centers describes one of the groups. In this step, each point of data is assigned to a centroid based on Pete Avian distance. Formally, if the centroid collection is in C, then each data point associated with the group is based on a group
$ \ underset {c_i \ v C} {\ arg \ min} \; dist (c_i, x) ^ 2 $ $
Where the dist (·) distance is Euclidean (L2). Give the data points for each Si percentage.

Step 2:

Recovery support:
At this step, a percentage is calculated. This is achieved by the average of all data items assigned to their team.
$ c_i = \ frac {1} {| S_i |} \ sum_ {x_i \} $$ x_i in S_i
Repeat the steps between steps 1 and 2 for Farage Target Exposure (ie these groups do not change data points, smaller distances, or the maximum number of repeats).
It is certain that this algorithm has a set of results. The result may be totally localized (i.e., not necessarily the best possible result), which means that more than one implementation of an introduction with previous previous centroid can give better results.


Select K                                                                                
The above statement lists the spaces and symbols of the selected data. To determine the amount of data, a user must run a K-Medium algorithm that combines several K values and compares the results. In general, it is not possible to estimate the correct K value, but the correct measurement is determined by the following techniques.

One of the criteria for comparing the K value to the average is the average distance between the data and the group percent. Since increasing the number of groups always reduces the distance between the data points, the increase in K always reduces this measurement because K equals the number of data points. Therefore, these principles cannot be used for a particular purpose. In contrast, the average mean diameter is called & quot; K & quot; and & quot; Elbow & quot;, where the degree of change is changed, can be used to detect K.

There are a number of other K-approval techniques, including multi-platform requirements, information requirements, flow mode, silhouette and G-center algorithm. In addition, controlling group data sharing provides information on how the algorithm distributes data from K.  
Are you interested in online data science course ?

No comments:

Post a Comment

Merits & Demerits of Data Analytics

Definition:  The data analysis process was concluded with the conclusions and/or data obtained from the data analysis. Analysis data show...