Welcome

LIVE Classes

Courses

Practice Platforms

Leaderboard

Rewards

Referral

Profile

Finish

Welcome LIVE Classes Courses Practice Platforms Leaderboard Rewards Referral Profile Finish

Welcome to HCL GUVI

Hey there! Welcome to HCL GUVI—Grab Your Vernacular Imprint—where tech learning is easy, fun, and curated specially for you. Incubated by IIT Madras & IIM Ahmedabad in 2014 and now part of HCL Group, we're making quality tech education accessible to all.

Join 3M+ learners breaking barriers and upskilling for a brighter future. We're here to guide you every step of the way! 🚀

LIVE Classes

Zen Classes are HCL GUVI's most refined and flagship product—live, expert-led tech programs for beginners and pros. With IITM Pravartak affiliations, master Full-Stack, Data Science, DevOps, UI/UX, and more in multiple languages!

Explore More

Courses

Looking for flexibility? HCL GUVI's 200+ self-paced courses let you learn anytime, anywhere! From free lessons to IIT-M & Autodesk-certified programs, gain in-demand skills in your preferred language.

Explore More

Practice Platforms

Enhance your coding skills with HCL GUVI's Practice Platforms—interactive, structured, and designed to help you master programming effortlessly.

CodeKata:

A structured coding practice platform with 1500+ coding problems designed by industry experts. Ideal for beginners and professionals preparing for tech interviews with real-world coding challenges.

Try Now >

WebKata:

An interactive platform to master HTML, CSS, JavaScript, and Bootstrap with a live coding environment. Perfect for hands-on web development practice without any setup.

Try Now >

SQLKata:

A practice ground for mastering SQL queries used in real-world applications. Write, optimize, and refine your queries to build strong database skills.

Try Now >

Debugging:

Hone your bug-fixing skills with real-world debugging challenges in Python, C++, JavaScript, and Golang. More languages coming soon!

Try Now >

IDE:

A free online compiler supporting 20+ programming languages with auto-complete, debugging, and AI-powered code generation—all in the cloud!

Try Now >

Leaderboard

Climb the leaderboard as you earn Geekoins by learning and practicing! The top scorers get featured, making learning competitive and rewarding. Keep going—you could be next!

Explore More

Rewards

Earn Geekoins by watching videos and practicing problems, then redeem them for exciting rewards. The more you engage, the more you win!

Explore More

Referral

Love learning with HCL GUVI? Share it with friends! Invite them using your unique link or code and unlock exciting rewards—Amazon vouchers, iPhones, and more. A Win-Win.

Explore More

Profile

Your HCL GUVI profile is your digital portfolio! Track progress, showcase skills, add projects, and build a resume. Keep it updated—opportunities await!

Explore More

That's It! You Are Ready!

You're all set to dive into your learning journey with HCL GUVI. Explore, upskill, and make each step count—exciting possibilities awaits!

Home
Python 3
Data Preprocessing

Data Preprocessing

Lesson 5: Data Preprocessing

Lesson 5: Data Preprocessing

Raw data is not directly suitable for machine learning models. In this lesson, we clean missing values, create meaningful features, and convert categorical variables into numerical format so the churn prediction model can learn effectively.

a. Fixing Missing Values

First, we ensure that the TotalCharges column is numeric and handle missing values.

Code:

# Convert TotalCharges to numeric

df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# Check missing values

print("Missing values per column:")

print(df.isnull().sum())

# Drop rows with missing TotalCharges

df = df.dropna(subset=['TotalCharges'])

Output:

Displays the number of missing values in each column.
Removes rows where TotalCharges was missing.

This ensures the dataset is clean and ready for feature engineering.

b. Feature Engineering

Next, we create new features to improve model performance.

Code:

# Tenure buckets

def tenure_group(tenure):

if tenure <= 12:

return '0-12'

elif tenure <= 24:

return '12-24'

elif tenure <= 48:

return '24-48'

else:

return '48+'

df['TenureGroup'] = df['tenure'].apply(tenure_group)

# Average monthly spend

df['AvgMonthlySpend'] = df['TotalCharges'] / (df['tenure'].replace(0, 1))

# Service count

services = ['PhoneService','MultipleLines','InternetService','OnlineSecurity',

'OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']

df['ServiceCount'] = df[services].apply(lambda x: sum(x == 'Yes'), axis=1)

# Payment stability

df['PaymentStability'] = df['PaymentMethod'].apply(

lambda x: 'Automatic' if 'automatic' in x.lower() else 'Manual'

)

Output:

Creates new columns: TenureGroup, AvgMonthlySpend, ServiceCount, and PaymentStability.

These engineered features help the model better understand customer behavior patterns.

c. Encoding Categorical Variables

Machine learning models require numerical input, so we convert categorical columns into numbers.

Code:

from sklearn.preprocessing import LabelEncoder

# Encode binary categorical columns

binary_cols = ['gender','Partner','Dependents','PhoneService',

'PaperlessBilling','Churn','PaymentStability']

le = LabelEncoder()

for col in binary_cols:

df[col] = le.fit_transform(df[col])

# One-hot encode multi-class categorical columns

multi_cols = ['MultipleLines','InternetService','OnlineSecurity','OnlineBackup',

'DeviceProtection','TechSupport','StreamingTV','StreamingMovies',

'Contract','PaymentMethod','TenureGroup']

df = pd.get_dummies(df, columns=multi_cols, drop_first=True)

Output:

Binary columns are converted to 0 and 1.
Multi-class columns are transformed into multiple dummy variables.

After these preprocessing steps, the dataset becomes fully numerical and ready for model training.

Recommended Handbooks

4.7

C++ Handbook

Level up your programming skills with our C++ Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

6275

3 Hrs

4.7

Python Basics Handbook

Level up your programming skills with our Python Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

7396

3.5 Hrs

4.6

Javascript Handbook

Level up your programming skills with our JavaScript Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

6201

2 Hrs

ReactJS Projects Handbook

Learn ReactJS by building projects that mirror real-world applications. Strengthen your skills with step-by-step guidance and hands-on coding experience.

English

2.5 Hrs

Computer Networks Tutorial

A complete guide to computer networking, from fundamentals to protocols, routing, addressing, and real-world data communication.

English

1.5 Hrs

Operating System Tutorial

Your complete guide to Operating Systems, from fundamentals to advanced topics like memory management, scheduling, threads, and deadlock handling.

English

1 Hr

DBMS and SQL Tutorial

A complete handbook to guide you through DBMS fundamentals and SQL mastery, perfect for building data-driven applications, managing data systems, or preparing for database roles.

English

0.5 Hr

Java Tutorial

Beginner-friendly Java handbook covering core concepts, OOP principles, and hands-on programming examples.

English

2 Hrs

C Language Tutorial

A step-by-step C programming handbook for beginners. Understand C syntax, logic, memory, and hands-on coding to build solid programming foundations.

English

0.5 Hr

PHP Tutorial

Step-by-step PHP handbook for web developers. Master server-side scripting with practical code and concepts.

English

0.5 Hr

Android Tutorial

Beginner-friendly Android handbook covering app fundamentals, UI design, and hands-on development concepts.

English

1 Hr

Linux Guide Tutorial

A practical Linux handbook covering command-line basics, file management, and system operations.

English

2.5 Hrs

Data Structures and Algorithms Tutorial

Learn core data structures and algorithms with practical examples to improve coding efficiency and problem-solving skills.

English

0.5 Hr

Computer Architecture

A beginner-friendly guide to computer architecture covering processors, memory, and system-level concepts.

English

0.5 Hr

HTML 5 References Tutorial

A handy HTML5 reference guide covering modern tags, attributes, and semantic elements.

English

1.5 Hrs

Docker Tutorial

A hands-on Docker handbook covering containers, images, and modern application deployment basics.

English

0 Hr

GIT (Using Github) Tutorial

A hands-on Git and GitHub handbook for managing code, tracking changes, and collaborating on projects.

English

0.5 Hr

Go Language Tutorial

A beginner-friendly Go handbook covering core language concepts and modern backend programming.

English

0.5 Hr

GIT Guide

A practical Git guide covering version control basics, branching, and real project workflows.

English

1 Hr

CSS Tutorial

A beginner-friendly CSS handbook covering page styling, layouts, and responsive design basics.

English

1 Hr

Advanced Data Structures

A focused handbook covering advanced data structures for efficient and scalable problem solving.

English

0 Hr

Spring Framework Tutorial

A hands-on Spring Framework handbook covering core concepts and backend development fundamentals.

English

1 Hr

Spring Boot Tutorial

A practical Spring Boot handbook focused on building and running modern Java backend applications.

English

0.5 Hr

Kotlin Tutorial

A beginner-friendly Kotlin handbook covering modern language features and real-world development concepts.

English

1 Hr

Apache Cordova Tutorial

A hands-on Apache Cordova handbook for building cross-platform mobile apps with web technologies.

English

0 Hr

Python Tutorial

A beginner-friendly Python handbook covering core concepts and practical programming examples.

English

1.5 Hrs

SASS-SCSS Tutorial

A hands-on SASS / SCSS handbook for writing clean, reusable, and scalable stylesheets.

English

0.5 Hr

MongoDB Tutorial

A hands-on MongoDB handbook covering NoSQL concepts and modern database operations.

English

0.5 Hr

Numpy Tutorial

A hands-on NumPy handbook for fast numerical computation and data manipulation using Python.

English

1.5 Hrs

PL-SQL Tutorial

A hands-on PL/SQL handbook for writing procedural database programs and business logic.

English

0.5 Hr

Python Built-in Functions Tutorial

A handy reference guide to Python’s built-in functions for cleaner and faster coding.

English

0.5 Hr

Pandas Tutorial

A hands-on Pandas handbook for data manipulation, cleaning, and analysis using Python.

English

2.5 Hrs

Elasticsearch Tutorial

A hands-on Elasticsearch handbook covering indexing, searching, and data analysis concepts.

English

0 Hr

Matplotlib Tutorial

A hands-on Matplotlib handbook for creating charts and visualizing data using Python.

English

0.5 Hr

Web Scraping Tutorial

A hands-on web scraping handbook for extracting and working with data from websites.

English

0 Hr

Networking with Python

A hands-on handbook for building network-enabled applications using Python.

English

0.5 Hr

Tkinter Tutorial

A hands-on Tkinter handbook for building desktop applications with Python.

English

0.5 Hr

Java Programs Tutorial

A hands-on Java programs handbook for practicing core concepts and problem-solving in Java.

English

2 Hrs

Java Examples Tutorial

A hands-on Java examples handbook focused on logic building and practical coding.

English

3.5 Hrs

Servlet Tutorial

A hands-on Java Servlet handbook for building server-side web applications.

English

0.5 Hr

JSP Tutorial

A hands-on JSP handbook for creating dynamic server-side web pages with Java.

English

0.5 Hr

Java Type Conversion Tutorial

A concise Java handbook explaining type conversion and casting with clear examples.

English

0.5 Hr

Java 8 Tutorial

A hands-on Java 8 handbook focused on modern language features and functional programming.

English

0.5 Hr

Java 9 Tutorial

A practical Java 9 handbook covering modules and platform enhancements.

English

0 Hr

Java 10 Tutorial

A focused Java 10 handbook covering language refinements and performance upgrades.

English

0 Hr

Java 11 Tutorial

A hands-on Java 11 handbook focused on modern APIs and long-term support features.

English

0 Hr

Java Util Library Tutorial

A hands-on Java Util library handbook covering essential utility classes and collections.

English

0.5 Hr

Building a Contact Us Form in ReactJS

Responsive ReactJS Contact Form with validation, error messages, and success animation.

English

0.5 Hr

Building a Age Calculator App Using ReactJS

Quickly find your exact age with this interactive ReactJS Age Calculator.

English

0.5 Hr

Movie Recommendation System Project Using Content-Based Filtering

Build a movie recommendation system that suggests similar movies using genre similarity and average ratings. A simple, practical ML project for beginners to understand real-world recommenders.

English

0.5 Hr

Recipe Finder App using ReactJS

Build a live Recipe Finder app using ReactJS. Search recipes, view details in modals, and handle state, events, and API data efficiently.

English

0.5 Hr

Sales Data Analysis Project for Beginners Using Data Science

Analyze sales data to find revenue trends, top products, quarterly patterns, and key customer insights. A beginner-friendly project for hands-on business data analysis in Python.

English

0.5 Hr

Student Performance Analysis Project for Beginners Using Data Science

Analyze student performance data to uncover attendance trends, study patterns, score improvements, and key exam factors. A beginner-friendly Python project for hands-on learning.

English

0.5 Hr

Word Counter Tool Using ReactJS

Learn to create a live Word & Character Counter using ReactJS, Tailwind CSS, and JavaScript. Practice state, events, and conditional rendering.

English

0.5 Hr

Data Preprocessing

Contents

Lesson 5: Data Preprocessing

a. Fixing Missing Values

b. Feature Engineering

c. Encoding Categorical Variables

Customer Churn Prediction Project Using Classification Techniques

Recommended Handbooks

C++ Handbook

Python Basics Handbook

Javascript Handbook

ReactJS Projects Handbook

Computer Networks Tutorial

Operating System Tutorial

DBMS and SQL Tutorial

Java Tutorial

C Language Tutorial

PHP Tutorial

Android Tutorial

Linux Guide Tutorial

Data Structures and Algorithms Tutorial

Computer Architecture

HTML 5 References Tutorial

Docker Tutorial

GIT (Using Github) Tutorial

Go Language Tutorial

GIT Guide

CSS Tutorial

Advanced Data Structures

Spring Framework Tutorial

Spring Boot Tutorial

Kotlin Tutorial

Apache Cordova Tutorial

Python Tutorial

SASS-SCSS Tutorial

MongoDB Tutorial

Numpy Tutorial

PL-SQL Tutorial

Python Built-in Functions Tutorial

Pandas Tutorial

Elasticsearch Tutorial

Matplotlib Tutorial

Web Scraping Tutorial

Networking with Python

Tkinter Tutorial

Java Programs Tutorial

Java Examples Tutorial

Servlet Tutorial

JSP Tutorial

Java Type Conversion Tutorial

Java 8 Tutorial

Java 9 Tutorial

Java 10 Tutorial

Java 11 Tutorial

Java Util Library Tutorial

Building a Contact Us Form in ReactJS

Building a Age Calculator App Using ReactJS

Movie Recommendation System Project Using Content-Based Filtering

Recipe Finder App using ReactJS

Sales Data Analysis Project for Beginners Using Data Science

Student Performance Analysis Project for Beginners Using Data Science

Word Counter Tool Using ReactJS