Menu

Data Auditing and Initial Inspection

Lesson 3: Data Auditing and Initial Inspection

Before moving ahead, we must inspect the dataset to ensure it is loaded correctly and has no structural issues. Data auditing helps detect missing values, wrong data types, or unexpected columns early.

To begin, we view the first few rows:

Code:

# View first few rows

df.head()

Output: Displays the first 5 rows of the dataset.

Next, we check the dataset size:

Code:

# Check number of rows and columns

df.shape

Output: Returns a tuple showing (number_of_rows, number_of_columns).

Then, we examine detailed dataset information:

Code:

# Get basic information about the dataset

df.info()

Output: Shows column names, non-null counts, data types, and memory usage.

Finally, we review summary statistics for numerical columns:

Code:

# Show basic statistics for numerical columns

df.describe()

Output: Displays count, mean, std, min, max, and percentile values for numerical columns.

These steps confirm column names, data types, dataset size, and overall data quality before moving to preprocessing.