Data · Level 5

5.1 Clustering

Discover an algorithm for grouping similar data points, and use it to classify cities by their climate.

Finding Clusters in Data

K-Means Algorithm

Inertia

Number of Clusters

Normalizing Variables

Handling Outliers

Transforming Variables


Course description

Use global temperature and precipitation data to explore clustering and classification of cities by their climate. You’ll examine the inner workings of the k-means clustering algorithm, implement data processing steps, and explore key decisions needed to get meaningful clusters out of real-world data.


Topics covered

  • Clusters
  • Euclidean Distance
  • The K-Means Algorithm
  • Inertia
  • Elbow Plots
  • Normalization
  • Outliers
  • Variable Transformations
  • Log-Transforms

Prerequisites and next steps

This course assumes comfort with reading scatter plots and histograms and applying basic mathematical operations to columns of data. Familiarity with logarithms and Pythagoras’ Theorem is helpful but not essential.