This article describes how to select rows of pandas.DataFrame
by multiple conditions.
- Basic method for selecting rows of
pandas.DataFrame
- Select rows with multiple conditions
- The operator precedence
Two points to note are:
- Use
&
、|
、~
[notand
,or
,not
] - Enclose each conditional expression in parentheses when using comparison operators
Error when using and
, or
, not
:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool[], a.item[], a.any[] or a.all[].
Error when no parentheses:
TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]
In the sample code, the following CSV file is read and used.
- sample_pandas_normal.csv
import pandas as pd
df = pd.read_csv['data/src/sample_pandas_normal.csv']
print[df]
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
The sample code uses pandas.DataFrame
, but the same applies to pandas.Series
.
Basic
method for selecting rows of pandas.DataFrame
Using a list, array, or pandas.Series
of boolean bool
, you can select rows that are True
.
mask = [True, False, True, False, True, False]
df_mask = df[mask]
print[df_mask]
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
Select rows with multiple conditions
You can get pandas.Series
of bool
which is an AND of two conditions using &
.
Note that ==
and ~
are used here as the second condition for the sake of explanation, but you can use !=
as well.
print[df['age']