Menu

Exploratory Data Analysis (EDA)

Lesson 4: Exploratory Data Analysis (EDA)

Exploratory Data Analysis helps us understand patterns, trends, and relationships in the dataset before building a churn prediction model. It reveals which factors influence customer churn and highlights important behavioral signals.

a. Churn Distribution Analysis

We first analyze how many customers churned versus retained.

Code:

import seaborn as sns

# Set style

sns.set(style="whitegrid", palette="muted", font_scale=1.2)

# Distribution of Churn

plt.figure(figsize=(6,4))

sns.countplot(x='Churn', data=df, palette='Set2')

plt.title("Churn Distribution")

plt.show()

Output: Displays the count of customers who churned and those who stayed.

This helps us understand whether the dataset is balanced or imbalanced, which is important for model performance.

b. Churn By Contract Type

Next, we examine how contract types relate to churn.

Code:

# Churn vs Contract Type

plt.figure(figsize=(6,4))

sns.countplot(x='Contract', hue='Churn', data=df, palette='Set1')

plt.title("Churn by Contract Type")

plt.show()

Output: Shows churn distribution across different contract types.

This reveals whether customers with month-to-month contracts churn more frequently than long-term contract users.

c. Churn Vs Payment Method

We analyze whether payment methods influence churn behavior.

Code:

# Churn vs Payment Method

plt.figure(figsize=(6,3))

sns.countplot(x='PaymentMethod', hue='Churn', data=df, palette='Set3')

plt.xticks(rotation=45)

plt.title("Churn by Payment Method")

plt.show()

Output: Displays churn counts based on different payment methods.

This helps identify payment patterns linked to higher churn rates.

d. Tenure Distribution

Now, we study how customer tenure relates to churn.

Code:

# Tenure Distribution

plt.figure(figsize=(6,3))

sns.histplot(data=df, x='tenure', hue='Churn', multiple='stack', bins=30, palette='coolwarm')

plt.title("Tenure Distribution by Churn")

plt.show()

Output: Shows how long customers stayed before churning or continuing.

This helps determine whether new customers are more likely to churn.

e. Monthly Charges Vs Churn

Next, we compare monthly charges with churn behavior.

Code:

# Monthly Charges vs Churn

plt.figure(figsize=(6,3))

sns.boxplot(x='Churn', y='MonthlyCharges', data=df, palette='Set2')

plt.title("Monthly Charges vs Churn")

plt.show()

Output: Displays the distribution of monthly charges for churned and retained customers.

This helps identify whether higher charges are associated with increased churn.

f. Correlation Heatmap

Finally, we analyze correlations between numerical features and churn.

Code:

# Convert 'Churn' to numeric for correlation analysis

df['Churn_numeric'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)

# Select numerical features

numeric_features = ['SeniorCitizen', 'tenure', 'MonthlyCharges']

corr_matrix = df[numeric_features + ['Churn_numeric']].corr()

# Plot heatmap

plt.figure(figsize=(6,3))

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")

plt.title("Correlation Heatmap (Numerical Features vs Churn)")

plt.show()

Output: Displays correlation values between numerical features and churn.

This helps identify which variables have stronger relationships with customer churn before model building.