K-Means (Part 1): How to Cluster 20 Asian Football Teams?(Practical Data Analysis 20)
Learn K-Means clustering with a practical guide on grouping 20 Asian football teams. Understand centroids, distance metrics, and algorithm principles in depth.
Welcome to the "Practical Data Analysis" Series.
Today, I’ll guide you through learning K-Means.
K-Means is an unsupervised learning algorithm designed to solve clustering problems.
K represents the number of clusters, and Means refers to the centroids. The essence of this algorithm is to determine the centroids of K clusters. Once these centroids are identified, the clustering process is complete.
Let’s consider the following three questions together:
How are the centroids of the K clusters determined?
How are other points assigned to these clusters?
How does K-Means differ from KNN?
If you understand these three questions, you’ll have a solid grasp of the principles of K-Means.
Imagine a Scenario
Let’s assume we have 20 Asian football teams and want to classify them into 3 levels based on their performance. How can this be done?
How K-Means Works
You might already have your own judgment regarding the levels of Asian football teams.
For example:
Who belongs to the top tier? You might say Iran or South Korea.
What about the second tier?Maybe China.
And the third tier?Perhaps Vietnam.
These judgments are based on experience. Iran, China, and Vietnam could represent the three levels, acting as the centroids for each cluster.
So, how do we determine the centroids of K clusters?