Welcome

LIVE Classes

Courses

Practice Platforms

Leaderboard

Rewards

Referral

Profile

Finish

Welcome LIVE Classes Courses Practice Platforms Leaderboard Rewards Referral Profile Finish

Welcome to HCL GUVI

Hey there! Welcome to HCL GUVI—Grab Your Vernacular Imprint—where tech learning is easy, fun, and curated specially for you. Incubated by IIT Madras & IIM Ahmedabad in 2014 and now part of HCL Group, we're making quality tech education accessible to all.

Join 3M+ learners breaking barriers and upskilling for a brighter future. We're here to guide you every step of the way! 🚀

LIVE Classes

Zen Classes are HCL GUVI's most refined and flagship product—live, expert-led tech programs for beginners and pros. With IITM Pravartak affiliations, master Full-Stack, Data Science, DevOps, UI/UX, and more in multiple languages!

Explore More

Courses

Looking for flexibility? HCL GUVI's 200+ self-paced courses let you learn anytime, anywhere! From free lessons to IIT-M & Autodesk-certified programs, gain in-demand skills in your preferred language.

Explore More

Practice Platforms

Enhance your coding skills with HCL GUVI's Practice Platforms—interactive, structured, and designed to help you master programming effortlessly.

CodeKata:

A structured coding practice platform with 1500+ coding problems designed by industry experts. Ideal for beginners and professionals preparing for tech interviews with real-world coding challenges.

Try Now >

WebKata:

An interactive platform to master HTML, CSS, JavaScript, and Bootstrap with a live coding environment. Perfect for hands-on web development practice without any setup.

Try Now >

SQLKata:

A practice ground for mastering SQL queries used in real-world applications. Write, optimize, and refine your queries to build strong database skills.

Try Now >

Debugging:

Hone your bug-fixing skills with real-world debugging challenges in Python, C++, JavaScript, and Golang. More languages coming soon!

Try Now >

IDE:

A free online compiler supporting 20+ programming languages with auto-complete, debugging, and AI-powered code generation—all in the cloud!

Try Now >

Leaderboard

Climb the leaderboard as you earn Geekoins by learning and practicing! The top scorers get featured, making learning competitive and rewarding. Keep going—you could be next!

Explore More

Rewards

Earn Geekoins by watching videos and practicing problems, then redeem them for exciting rewards. The more you engage, the more you win!

Explore More

Referral

Love learning with HCL GUVI? Share it with friends! Invite them using your unique link or code and unlock exciting rewards—Amazon vouchers, iPhones, and more. A Win-Win.

Explore More

Profile

Your HCL GUVI profile is your digital portfolio! Track progress, showcase skills, add projects, and build a resume. Keep it updated—opportunities await!

Explore More

That's It! You Are Ready!

You're all set to dive into your learning journey with HCL GUVI. Explore, upskill, and make each step count—exciting possibilities awaits!

Home
Python 3
Data Cleaning and Preparation

Data Cleaning and Preparation

Lesson 4: Data Cleaning and Preparation

Lesson 4: Data Cleaning and Preparation

Once we understand the dataset structure, the next step is to clean and prepare the data so it can be used reliably for analysis. Real-world sales data often contains missing values and incorrect data types, which can lead to wrong results if not handled properly.

The first part of data cleaning focuses on handling missing values. In this project, some address-related columns contain missing entries. Instead of removing rows, we fill these missing values with meaningful defaults to keep the dataset complete.

Code to handle missing values:

# Fill ADDRESSLINE2 with empty string

df['ADDRESSLINE2'] = df['ADDRESSLINE2'].fillna('')

# Fill STATE with 'Unknown'

df['STATE'] = df['STATE'].fillna('Unknown')

# Fill POSTALCODE with '00000'

df['POSTALCODE'] = df['POSTALCODE'].fillna('00000')

# Fill TERRITORY with 'Not Assigned'

df['TERRITORY'] = df['TERRITORY'].fillna('Not Assigned')

Next, we convert important columns to the correct data types. Dates are converted to datetime format, and numeric columns are converted to numeric types. This step is essential for performing calculations, grouping, and time-based analysis.

Code to convert data types:

df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'], errors='coerce')

df['SALES'] = pd.to_numeric(df['SALES'], errors='coerce')

df['QUANTITYORDERED'] = pd.to_numeric(df['QUANTITYORDERED'], errors='coerce')

df['PRICEEACH'] = pd.to_numeric(df['PRICEEACH'], errors='coerce')

df['MSRP'] = pd.to_numeric(df['MSRP'], errors='coerce')

After cleaning and conversion, the dataset becomes consistent and analysis-ready. This prepared data will now be used for grouping, aggregation, and exploratory analysis in the next lesson.

Recommended Handbooks

4.7

C++ Handbook

Level up your programming skills with our C++ Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

6272

3 Hrs

4.7

Python Basics Handbook

Level up your programming skills with our Python Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

7387

3.5 Hrs

4.6

Javascript Handbook

Level up your programming skills with our JavaScript Tutorials hub, guiding you from beginner to advanced. Start your journey now!

English

6200

2 Hrs

ReactJS Projects Handbook

Learn ReactJS by building projects that mirror real-world applications. Strengthen your skills with step-by-step guidance and hands-on coding experience.

English

2.5 Hrs

Computer Networks Tutorial

A complete guide to computer networking, from fundamentals to protocols, routing, addressing, and real-world data communication.

English

1.5 Hrs

Operating System Tutorial

Your complete guide to Operating Systems, from fundamentals to advanced topics like memory management, scheduling, threads, and deadlock handling.

English

1 Hr

DBMS and SQL Tutorial

A complete handbook to guide you through DBMS fundamentals and SQL mastery, perfect for building data-driven applications, managing data systems, or preparing for database roles.

English

0.5 Hr

Java Tutorial

Beginner-friendly Java handbook covering core concepts, OOP principles, and hands-on programming examples.

English

2 Hrs

C Language Tutorial

A step-by-step C programming handbook for beginners. Understand C syntax, logic, memory, and hands-on coding to build solid programming foundations.

English

0.5 Hr

PHP Tutorial

Step-by-step PHP handbook for web developers. Master server-side scripting with practical code and concepts.

English

0.5 Hr

Android Tutorial

Beginner-friendly Android handbook covering app fundamentals, UI design, and hands-on development concepts.

English

1 Hr

Linux Guide Tutorial

A practical Linux handbook covering command-line basics, file management, and system operations.

English

2.5 Hrs

Data Structures and Algorithms Tutorial

Learn core data structures and algorithms with practical examples to improve coding efficiency and problem-solving skills.

English

0.5 Hr

Computer Architecture

A beginner-friendly guide to computer architecture covering processors, memory, and system-level concepts.

English

0.5 Hr

HTML 5 References Tutorial

A handy HTML5 reference guide covering modern tags, attributes, and semantic elements.

English

1.5 Hrs

Docker Tutorial

A hands-on Docker handbook covering containers, images, and modern application deployment basics.

English

0 Hr

GIT (Using Github) Tutorial

A hands-on Git and GitHub handbook for managing code, tracking changes, and collaborating on projects.

English

0.5 Hr

Go Language Tutorial

A beginner-friendly Go handbook covering core language concepts and modern backend programming.

English

0.5 Hr

GIT Guide

A practical Git guide covering version control basics, branching, and real project workflows.

English

1 Hr

CSS Tutorial

A beginner-friendly CSS handbook covering page styling, layouts, and responsive design basics.

English

1 Hr

Advanced Data Structures

A focused handbook covering advanced data structures for efficient and scalable problem solving.

English

0 Hr

Spring Framework Tutorial

A hands-on Spring Framework handbook covering core concepts and backend development fundamentals.

English

1 Hr

Spring Boot Tutorial

A practical Spring Boot handbook focused on building and running modern Java backend applications.

English

0.5 Hr

Kotlin Tutorial

A beginner-friendly Kotlin handbook covering modern language features and real-world development concepts.

English

1 Hr

Apache Cordova Tutorial

A hands-on Apache Cordova handbook for building cross-platform mobile apps with web technologies.

English

0 Hr

Python Tutorial

A beginner-friendly Python handbook covering core concepts and practical programming examples.

English

1.5 Hrs

SASS-SCSS Tutorial

A hands-on SASS / SCSS handbook for writing clean, reusable, and scalable stylesheets.

English

0.5 Hr

MongoDB Tutorial

A hands-on MongoDB handbook covering NoSQL concepts and modern database operations.

English

0.5 Hr

Numpy Tutorial

A hands-on NumPy handbook for fast numerical computation and data manipulation using Python.

English

1.5 Hrs

PL-SQL Tutorial

A hands-on PL/SQL handbook for writing procedural database programs and business logic.

English

0.5 Hr

Python Built-in Functions Tutorial

A handy reference guide to Python’s built-in functions for cleaner and faster coding.

English

0.5 Hr

Pandas Tutorial

A hands-on Pandas handbook for data manipulation, cleaning, and analysis using Python.

English

2.5 Hrs

Elasticsearch Tutorial

A hands-on Elasticsearch handbook covering indexing, searching, and data analysis concepts.

English

0 Hr

Matplotlib Tutorial

A hands-on Matplotlib handbook for creating charts and visualizing data using Python.

English

0.5 Hr

Web Scraping Tutorial

A hands-on web scraping handbook for extracting and working with data from websites.

English

0 Hr

Networking with Python

A hands-on handbook for building network-enabled applications using Python.

English

0.5 Hr

Tkinter Tutorial

A hands-on Tkinter handbook for building desktop applications with Python.

English

0.5 Hr

Java Programs Tutorial

A hands-on Java programs handbook for practicing core concepts and problem-solving in Java.

English

2 Hrs

Java Examples Tutorial

A hands-on Java examples handbook focused on logic building and practical coding.

English

3.5 Hrs

Servlet Tutorial

A hands-on Java Servlet handbook for building server-side web applications.

English

0.5 Hr

JSP Tutorial

A hands-on JSP handbook for creating dynamic server-side web pages with Java.

English

0.5 Hr

Java Type Conversion Tutorial

A concise Java handbook explaining type conversion and casting with clear examples.

English

0.5 Hr

Java 8 Tutorial

A hands-on Java 8 handbook focused on modern language features and functional programming.

English

0.5 Hr

Java 9 Tutorial

A practical Java 9 handbook covering modules and platform enhancements.

English

0 Hr

Java 10 Tutorial

A focused Java 10 handbook covering language refinements and performance upgrades.

English

0 Hr

Java 11 Tutorial

A hands-on Java 11 handbook focused on modern APIs and long-term support features.

English

0 Hr

Java Util Library Tutorial

A hands-on Java Util library handbook covering essential utility classes and collections.

English

0.5 Hr

Movie Recommendation System Project Using Content-Based Filtering

Build a movie recommendation system that suggests similar movies using genre similarity and average ratings. A simple, practical ML project for beginners to understand real-world recommenders.

English

0.5 Hr

Data Cleaning and Preparation

Contents

Lesson 4: Data Cleaning and Preparation

Sales Data Analysis Project for Beginners Using Data Science

Recommended Handbooks

C++ Handbook

Python Basics Handbook

Javascript Handbook

ReactJS Projects Handbook

Computer Networks Tutorial

Operating System Tutorial

DBMS and SQL Tutorial

Java Tutorial

C Language Tutorial

PHP Tutorial

Android Tutorial

Linux Guide Tutorial

Data Structures and Algorithms Tutorial

Computer Architecture

HTML 5 References Tutorial

Docker Tutorial

GIT (Using Github) Tutorial

Go Language Tutorial

GIT Guide

CSS Tutorial

Advanced Data Structures

Spring Framework Tutorial

Spring Boot Tutorial

Kotlin Tutorial

Apache Cordova Tutorial

Python Tutorial

SASS-SCSS Tutorial

MongoDB Tutorial

Numpy Tutorial

PL-SQL Tutorial

Python Built-in Functions Tutorial

Pandas Tutorial

Elasticsearch Tutorial

Matplotlib Tutorial

Web Scraping Tutorial

Networking with Python

Tkinter Tutorial

Java Programs Tutorial

Java Examples Tutorial

Servlet Tutorial

JSP Tutorial

Java Type Conversion Tutorial

Java 8 Tutorial

Java 9 Tutorial

Java 10 Tutorial

Java 11 Tutorial

Java Util Library Tutorial

Movie Recommendation System Project Using Content-Based Filtering