Building The Logistic Regression Model
Lesson 6: Building The Logistic Regression Model
With the dataset fully preprocessed and converted into numerical format, we now build the churn prediction model using Logistic Regression. This model learns patterns from customer data and predicts whether a customer is likely to churn.
a. Splitting The Dataset
First, we separate features and the target variable, then split the data into training and testing sets.
Code:
from sklearn.model_selection import train_test_split
# Separate features and target
X = df.drop(['Churn', 'customerID'], axis=1)
y = df['Churn']
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
Output:
- X_train, X_test, y_train, and y_test are created.
- 80% of data is used for training and 20% for testing.
- Stratification ensures churn distribution remains balanced in both sets.
b. Training The Logistic Regression Model
Next, we initialize and train the model.
Code:
from sklearn.linear_model import LogisticRegression
# Initialize model
log_reg = LogisticRegression(max_iter=8000)
# Train the model
log_reg.fit(X_train, y_train)
Output:
- The Logistic Regression model is trained on the training dataset.
The model now learns the relationship between customer features and churn probability.
c. Model Evaluation
After training, we evaluate the model using standard classification metrics.
- Accuracy: Overall correctness of predictions.
- Precision: How many predicted churns were actually churns.
- Recall: How many actual churns were correctly identified.
- F1 Score: Balance between precision and recall.
Code:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
# Predictions
y_pred = log_reg.predict(X_test)
# Evaluation metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
Output Explanation:
- Accuracy = 1.0
The model correctly predicted 100% of the total test samples. - Precision = 1.0
Every customer predicted as churn actually churned. There were no false positives. - Recall = 1.0
The model identified all actual churn cases. There were no false negatives. - F1 Score = 1.0
Since both precision and recall are perfect, the F1 score is also perfect, indicating ideal classification performance.
These metrics help us understand how well the model performs.
d. Confusion Matrix Visualization
Finally, we visualize prediction results using a confusion matrix.
Code:
# Confusion Matrix
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(4,3))
sns.heatmap(
cm,
annot=True,
fmt="d",
cmap="Blues",
xticklabels=['No', 'Yes'],
yticklabels=['No', 'Yes']
)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

This matrix clearly shows where the model performs well and where misclassifications occur.










