When you load a dataset into pandas, whether it’s from a CSV file, an Excel sheet, or another source, you often want to see what you’re dealing with right away. That’s where head() comes in. It’s like taking a glance at the top of your data to understand its structure and content.
What does head() do?
The head() function in pandas is used to display the first n rows of a DataFrame or Series. “Head” in this context literally refers to the “head” or beginning portion of your data. By default, if you don’t specify how many rows you want, head() will show you the first 5 rows.
Basic Usage
Let’s see it in action. First, we’ll create a simple DataFrame to work with:
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'], 'Age': [25, 30, 22, 35, 28, 40, 31], 'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Rome']} df = pd.DataFrame(data) print(df) # Let's see the whole DataFrame first
This will output the entire DataFrame:
Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 22 Paris 3 David 35 Tokyo 4 Eve 28 Sydney 5 Frank 40 Berlin 6 Grace 31 Rome
Now, let’s use head() without specifying the number of rows:
print(df.head())
This will give you:
Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 22 Paris 3 David 35 Tokyo 4 Eve 28 Sydney
As you can see, head() displayed the first 5 rows of the DataFrame.
Controlling the Number of Rows
You’re not limited to just 5 rows. You can easily customize how many rows head() shows by passing an integer argument, let’s say you want to see just the top 3 rows:
print(df.head(3))
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 22 Paris
Now, only the first 3 rows are displayed. You can put any positive integer inside head() to see that many initial rows.
Why is head() useful?
When you load a large dataset, printing the entire DataFrame can be overwhelming and slow. head() lets you quickly get a sense of the column names, data types, and a few sample rows to understand the data’s structure.
After reading data from a file, head() is a great way to quickly confirm that pandas has loaded the data correctly and that it looks as expected.
In the initial stages of EDA, head() is often one of the first functions you’ll use to get a feel for your variables and the range of values they contain.
head() with Series
head() isn’t just for DataFrames; it works just as well with pandas Series:
ages_series = df['Age'] print(ages_series.head(4))
Output:
0 25 1 30 2 22 3 35 Name: Age, dtype: int64
Here, head(4) on the ‘Age’ Series shows the first four ages.