简体   繁体   中英

Pandas: select rows if keyword appears in any column

I know there is a relevant thread about searching for a string in one column ( here ) but how does one use pd.Series.str.contains(pattern) across all columns?

df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': [u'aball', u'bball', u'cnut', u'fball'],
'id2': [u'uball', u'mball', u'pnut', u'zball']})


In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids  vals
0  aball     1
1  bball     2
3  fball     4

Use select_dtypes for only object columns (obviously strings) with applymap and in :

df = pd.DataFrame({'vals': [1, 2, 3, 4], 
                   'ids': [None, u'bball', u'cnut', u'fball'],
                   'id2': [u'uball', u'mball', u'pnut', u'zball']})
print (df)
   vals    ids    id2
0     1   None  uball
1     2  bball  mball
2     3   cnut   pnut
3     4  fball  zball

mask = df.select_dtypes(include=[object]).applymap(lambda x: 'ball' in x if pd.notnull(x) else False)
#if always non NaNs, no Nones
#mask = df.select_dtypes(include=[object]).applymap(lambda x: 'ball' in x)
print (mask)
     ids    id2
0  False   True
1   True   True
2  False  False
3   True   True

Another solution is use apply with contains :

mask = df.select_dtypes(include=[object]).apply(lambda x: x.str.contains('ball', na=False))
#if always non NaNs, no Nones
#mask = df.select_dtypes(include=[object]).apply(lambda x: x.str.contains('ball'))
print (mask)
     ids    id2
0  False   True
1   True   True
2  False  False
3   True   True

Then for filtering use DataFrame.any for check at least one True per rows or DataFrame.all for check all values per rows:

df1 = df[mask.any(axis=1)]
print (df1)
   vals    ids    id2
0     1   None  uball
1     2  bball  mball
3     4  fball  zball

df2 = df[mask.all(axis=1)]
print (df2)
   vals    ids    id2
1     2  bball  mball
3     4  fball  zball

stack

If you select just the things that might have 'ball' which are columns that are of dtype object , then you can stack the resulting dataframe into a series object. At that point you can perform pandas.Series.str.contains and unstack the results back into a dataframe.

df.select_dtypes(include=[object]).stack().str.contains('ball').unstack()

     ids    id2
0   True   True
1   True   True
2  False  False
3   True   True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM