Data · Level 5
5.1 Clustering
Discover an algorithm for grouping similar data points, and use it to classify cities by their climate.
Finding Clusters in Data
K-Means Algorithm
Inertia
Number of Clusters
Normalizing Variables
Handling Outliers
Transforming Variables
Course description
Use global temperature and precipitation data to explore clustering and classification of cities by their climate. You’ll examine the inner workings of the k-means clustering algorithm, implement data processing steps, and explore key decisions needed to get meaningful clusters out of real-world data.
Topics covered
- Clusters
- Euclidean Distance
- The K-Means Algorithm
- Inertia
- Elbow Plots
- Normalization
- Outliers
- Variable Transformations
- Log-Transforms
Prerequisites and next steps
This course assumes comfort with reading scatter plots and histograms and applying basic mathematical operations to columns of data. Familiarity with logarithms and Pythagoras’ Theorem is helpful but not essential.