I'm working with a some Pandas dataframes and I can't quite get why some boolean operators are allowed and work in the .loc
-selector and others give an error. To be precise, let's take the following dataframe:
import pandas as pd
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two thr two two one thr'.split()})
Now both 'two' == 'two'
and 'w' in 'two'
evaluate as True
, but when used with df.loc[...]
the following works:
df.loc[df['B'] == 'two']
printing out
A B
2 foo two
4 foo two
5 bar two
But the following raises a KeyError: False
-error.
df.loc['w' in df['B']]
I know ways to work around this, but none of them feel particularly smooth, and even worse I don't understand at all why the 'w' in df['B']
-selector is not allowed in .loc
.
Have a look at the output of df['B'] == 'two
and compare it to the output of 'w' in df['B']
. The first one will output a panda Series containing either True or False for each row in df['B']
. The second one will output False
.
The .loc
operator can take "A boolean array of the same length as the axis being sliced, eg [True, False, True]" (see .loc documentation ). You obtain the KeyError: False
because .loc
tries to find False
which is neither a column nor a row name.
To use the w in df['B']
-expression you could do:
list_true_false = ['w' in entry for entry in df['B']]`
df.loc[list_true_false]`
Hope that helps!
You need the isin operator or the contains function
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html
df.loc[df['B'].isin(['two'])] # to match the full word specify it as list
df.loc[df['B'].str.contains('w')] # to match the pattern or a letter
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.