Customer Segmentation Using K-Means Clustering

4 min readMar 21, 2021

With the growing market and increase in customer's demands, it then becomes mandatory for a business to be at par with both the market and the customers.

To check the code for this analysis, see the link to my Github available here.

Introduction:

Imagine, walking into a store scrolling through various e-commerce platforms to buy the desired product and not finding it on the available platform and walking dissatisfied out of the store or signing out of the e-commerce platform. The shop or the e-commerce business here loses a customer and increasing its poor rating and this effect can even snowball.

To avoid losses due to such circumstances it is necessary to segment the customers based on various parameters in order to serve the right product/ service at the right time to the right customers.

Traditionally various techniques of customer segmentation were used which challenging and time-consuming task, that demanded hours of manually poring over various data tables and collecting the data. Machine learning has helped to overcome these hurdles and implement specific marketing strategies to targeted groups of people.

K-Means Clustering Algorithm:

One way to segment using Machine learning is using the Kmeans Algorithm:

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). It is an iterative algorithm, that partitions the unlabelled data into K distinct non-overlapping clusters. Each data point belongs to only one cluster.

It optimizes and tries to make the intra-cluster data points as similar as possible while keeping the clusters as distant as possible. The sum of the square distances of each data point of a particular cluster and its centroid is at the minimum. Hence each cluster tries to attend its own nearest homogeneity.

Project Overview:

The analysis is done on the dataset available on the Kaggle website https://www.kaggle.com/imkushwaha/customersegmentationdataset

The data-set has 3 main variables:

1.Age: Age of the customers in the mall,
2.Annual Income(k$) : Annual income in 1000$,
3.Spending Score (1–100): A score given to each customer depending on their spending power.

K-means Clustering Process:

Let's take Age and Spending Score and Cluster the dataset accordingly

The number of optimal clusters for the dataset is 4.

The data points are clustered into 4 groups in different colors and their respective cluster centers in black color.

2.Let’s take the Annual Income and Spending Score and Cluster the dataset accordingly.

The number of optimal clusters for the dataset is 5.

The data points are clustered into 5groups in different colors and their respective cluster centers in black color.

3. Let's take all the three variables Age, Annual Income, And Spending Score, and Cluster the dataset accordingly.

The number of optimal clusters for the dataset is 5.

All three variables are clustered into 5 distinct groups in different colors.

K-means clustering thus happens to be a fast and efficient machine learning algorithm to segment the customers on various parameters to garner their needs and help your business to grow exponentially.

This would lead to growing your happy and satisfied customer base which would help you to increase your overall rating.