Lesson 3: Data Auditing and Initial Inspection
Before moving ahead, we must inspect the dataset to ensure it is loaded correctly and has no structural issues. Data auditing helps detect missing values, wrong data types, or unexpected columns early.
To begin, we view the first few rows:
Code:
# View first few rows
df.head()

Output: Displays the first 5 rows of the dataset.
Next, we check the dataset size:
Code:
# Check number of rows and columns
df.shape
Output: Returns a tuple showing (number_of_rows, number_of_columns).
Then, we examine detailed dataset information:
Code:
# Get basic information about the dataset
df.info()
Output: Shows column names, non-null counts, data types, and memory usage.
Finally, we review summary statistics for numerical columns:
Code:
# Show basic statistics for numerical columns
df.describe()

Output: Displays count, mean, std, min, max, and percentile values for numerical columns.
These steps confirm column names, data types, dataset size, and overall data quality before moving to preprocessing.










