In Pandas, you can select columns by condition using boolean indexing. Boolean indexing allows you to select data based on a condition that evaluates to either True or False.
To select columns by condition, you can create a boolean mask by applying a condition to the DataFrame using comparison operators such as ==, >, <, >=, or <=. You can then use the boolean mask to select the columns that meet the condition. Here are some examples:
import pandas as pd # create a sample dataframe data = {'name': ['John', 'Jane', 'Bob'], 'age': [30, 25, 40], 'city': ['New York', 'Paris', 'London']} df = pd.DataFrame(data) # select columns where the age is greater than 25 age_mask = df['age'] > 25 age_columns = df.loc[:, age_mask] print(age_columns) # select columns where the city is either 'Paris' or 'London' city_mask = df['city'].isin(['Paris', 'London']) city_columns = df.loc[:, city_mask] print(city_columns) # select columns where the name starts with 'J' name_mask = df['name'].str.startswith('J') name_columns = df.loc[:, name_mask] print(name_columns)
This will output:
age city 0 30 New York 1 25 Paris 2 40 London age city 1 25 Paris 2 40 London name 0 John 1 Jane 2 Bob
In the example above, we first created a sample DataFrame with a ‘name’, ‘age’, and ‘city’ column. We then used boolean indexing to select different subsets of columns based on a condition:
- df.loc[:, age_mask] selects the columns where the age is greater than 25.
- df.loc[:, city_mask] selects the columns where the city is either ‘Paris’ or ‘London’.
- df.loc[:, name_mask] selects the columns where the name starts with ‘J’.
Note that when using boolean indexing, you need to use the .loc indexer to select columns by label. The : symbol in the .loc indexer is used to select all rows, while the boolean mask is used to select the columns that meet the condition.