简体   繁体   中英

Select rows from dataframe based on substring A or B in a column

Sorry, I needed to edit my question as I'm actually looking for substrings with more than one character. The suggested answers are good, but mostly work for one character strings.

import panda as pd

test = pd.DataFrame({'A': 'ju1 j4 abjul boy noc s1 asep'.split(),
                 'B': [1, 2, 3, 4, 5, 6, 7]})
print(test)


       A  B
0    ju1  1
1     j4  2
2  abjul  3
3    boy  4
4    noc  5
5     s1  6
6   asep  7

I know I can select all the rows that contain 'ju' with

subset = test[test['A'].str.contains('ju')]
print(subset)

       A  B
0    ju1  1
1  abjul  3

Is there an elegant way to select all rows that contain either 'ju' or 'as'?

This works as suggested below, are there other ways that also work?

ju = test.A.str.contains('ju')
as = test.A.str.contains('as')
subset = test[ju | as]
In [13]: test.loc[test.A.str.contains(r'[js]')]
Out[13]:
       A  B
0     j1  1
1     j4  2
2  abjul  3
5     s1  6
6   asep  7

option 1
try using str.match

test[test.A.str.match('.*[js].*')]

option 2
set operations

s = test.A.apply(set)
test[s.sub(set(list('js'))).lt(s)]

option 3
set operations with numpy broadcasting

s = test.A.apply(set)
test[(~(np.array([[set(['j'])], [set(['s'])]]) - s.values).astype(bool)).any(0)]

option 4
separate conditions

cond_j = test.A.str.contains('j')
cond_s = test.A.str.contains('s')
test[cond_j | cond_s]

All yield

       A  B
0     j1  1
1     j4  2
2  abjul  3
5     s1  6
6   asep  7

time testing

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM