[英]Select rows from dataframe based on substring A or B in a column
Sorry, I needed to edit my question as I'm actually looking for substrings with more than one character. 抱歉,由于我实际上正在查找具有多个字符的子字符串,因此我需要编辑问题。 The suggested answers are good, but mostly work for one character strings. 建议的答案很好,但大多数情况下只适用于一个字符串。
import panda as pd
test = pd.DataFrame({'A': 'ju1 j4 abjul boy noc s1 asep'.split(),
'B': [1, 2, 3, 4, 5, 6, 7]})
print(test)
A B
0 ju1 1
1 j4 2
2 abjul 3
3 boy 4
4 noc 5
5 s1 6
6 asep 7
I know I can select all the rows that contain 'ju' with 我知道我可以选择所有包含'ju'的行
subset = test[test['A'].str.contains('ju')]
print(subset)
A B
0 ju1 1
1 abjul 3
Is there an elegant way to select all rows that contain either 'ju' or 'as'? 有没有一种优雅的方法来选择所有包含'ju'或'as'的行?
This works as suggested below, are there other ways that also work? 如下所示,这可行,还有其他方法也可行吗?
ju = test.A.str.contains('ju')
as = test.A.str.contains('as')
subset = test[ju | as]
In [13]: test.loc[test.A.str.contains(r'[js]')]
Out[13]:
A B
0 j1 1
1 j4 2
2 abjul 3
5 s1 6
6 asep 7
option 1 选项1
try using str.match
尝试使用str.match
test[test.A.str.match('.*[js].*')]
option 2 选项2
set
operations set
操作
s = test.A.apply(set)
test[s.sub(set(list('js'))).lt(s)]
option 3 选项3
set
operations with numpy
broadcasting 通过numpy
广播set
操作
s = test.A.apply(set)
test[(~(np.array([[set(['j'])], [set(['s'])]]) - s.values).astype(bool)).any(0)]
option 4 选项4
separate conditions 分开的条件
cond_j = test.A.str.contains('j')
cond_s = test.A.str.contains('s')
test[cond_j | cond_s]
All yield 所有产量
A B
0 j1 1
1 j4 2
2 abjul 3
5 s1 6
6 asep 7
time testing 时间测试
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.