简体   繁体   English

根据列中的子字符串A或B从数据框中选择行

[英]Select rows from dataframe based on substring A or B in a column

Sorry, I needed to edit my question as I'm actually looking for substrings with more than one character. 抱歉,由于我实际上正在查找具有多个字符的子字符串,因此我需要编辑问题。 The suggested answers are good, but mostly work for one character strings. 建议的答案很好,但大多数情况下只适用于一个字符串。

import panda as pd

test = pd.DataFrame({'A': 'ju1 j4 abjul boy noc s1 asep'.split(),
                 'B': [1, 2, 3, 4, 5, 6, 7]})
print(test)


       A  B
0    ju1  1
1     j4  2
2  abjul  3
3    boy  4
4    noc  5
5     s1  6
6   asep  7

I know I can select all the rows that contain 'ju' with 我知道我可以选择所有包含'ju'的行

subset = test[test['A'].str.contains('ju')]
print(subset)

       A  B
0    ju1  1
1  abjul  3

Is there an elegant way to select all rows that contain either 'ju' or 'as'? 有没有一种优雅的方法来选择所有包含'ju'或'as'的行?

This works as suggested below, are there other ways that also work? 如下所示,这可行,还有其他方法也可行吗?

ju = test.A.str.contains('ju')
as = test.A.str.contains('as')
subset = test[ju | as]
In [13]: test.loc[test.A.str.contains(r'[js]')]
Out[13]:
       A  B
0     j1  1
1     j4  2
2  abjul  3
5     s1  6
6   asep  7

option 1 选项1
try using str.match 尝试使用str.match

test[test.A.str.match('.*[js].*')]

option 2 选项2
set operations set操作

s = test.A.apply(set)
test[s.sub(set(list('js'))).lt(s)]

option 3 选项3
set operations with numpy broadcasting 通过numpy广播set操作

s = test.A.apply(set)
test[(~(np.array([[set(['j'])], [set(['s'])]]) - s.values).astype(bool)).any(0)]

option 4 选项4
separate conditions 分开的条件

cond_j = test.A.str.contains('j')
cond_s = test.A.str.contains('s')
test[cond_j | cond_s]

All yield 所有产量

       A  B
0     j1  1
1     j4  2
2  abjul  3
5     s1  6
6   asep  7

time testing 时间测试

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM