How to use head in Pandas

When you load a dataset into pandas, whether it’s from a CSV file, an Excel sheet, or another source, you often want to see what you’re dealing with right away. That’s where head() comes in. It’s like taking a glance at the top of your data to understand its structure and content.

What does head() do?

The head() function in pandas is used to display the first n rows of a DataFrame or Series. “Head” in this context literally refers to the “head” or beginning portion of your data. By default, if you don’t specify how many rows you want, head() will show you the first 5 rows.

Basic Usage

Let’s see it in action. First, we’ll create a simple DataFrame to work with:

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
'Age': [25, 30, 22, 35, 28, 40, 31],
'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Rome']}
df = pd.DataFrame(data)

print(df) # Let's see the whole DataFrame first

This will output the entire DataFrame:

Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 22 Paris
3 David 35 Tokyo
4 Eve 28 Sydney
5 Frank 40 Berlin
6 Grace 31 Rome

Now, let’s use head() without specifying the number of rows:

print(df.head())

This will give you:

Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 22 Paris
3 David 35 Tokyo
4 Eve 28 Sydney

As you can see, head() displayed the first 5 rows of the DataFrame.

Controlling the Number of Rows

You’re not limited to just 5 rows. You can easily customize how many rows head() shows by passing an integer argument, let’s say you want to see just the top 3 rows:

print(df.head(3))

Output:

Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 22 Paris

Now, only the first 3 rows are displayed. You can put any positive integer inside head() to see that many initial rows.

Why is head() useful?

When you load a large dataset, printing the entire DataFrame can be overwhelming and slow. head() lets you quickly get a sense of the column names, data types, and a few sample rows to understand the data’s structure.

After reading data from a file, head() is a great way to quickly confirm that pandas has loaded the data correctly and that it looks as expected.

In the initial stages of EDA, head() is often one of the first functions you’ll use to get a feel for your variables and the range of values they contain.

head() with Series

head() isn’t just for DataFrames; it works just as well with pandas Series:

ages_series = df['Age']
print(ages_series.head(4))

Output:

0 25
1 30
2 22
3 35
Name: Age, dtype: int64

Here, head(4) on the ‘Age’ Series shows the first four ages.

Leave a Reply