简体   繁体   English

如何使用AND运算符过滤包含特定字符串值的行

[英]How to filter rows containing specific string values with an AND operator

My question is kind of an extension of the question answered quite well in this link: 我的问题是在这个链接中回答得很好的问题的扩展:

I've posted the answer here below where the strings are filtered out when they contain the word "ball": 我在下面发布了答案,当字符串包含单词“ball”时,字符串被过滤掉:

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids     vals
0  aball     1
1  bball     2
3  fball     4

Now my question is: what if I have long sentences in my data, and I want to identify strings with the words "ball" AND "field"? 现在我的问题是:如果我的数据中有长句,我想识别带有“ball”和“field”字样的字符串怎么办? So that it throws away data that contains the word "ball" or "field" when only one of them occur, but keeps the ones where the string has both words in it. 因此当它们中只有一个出现时,它会丢弃包含单词“ball”或“field”的数据,但保留字符串中包含两个单词的数据。

df[df['ids'].str.contains("ball")]

Would become: 会成为:

df[df['ids'].str.contains("ball") & df['ids'].str.contains("field")]

If you are into neater code: 如果你是更整洁的代码:

contains_balls = df['ids'].str.contains("ball")
contains_fields = df['ids'].str.contains("field")

filtered_df = df[contains_balls & contains_fields]

If you have more than 2 , You can using this ..(Notice the speed is not as good as foxyblue's method ) 如果你有2个以上,你可以使用它..(注意速度不如foxyblue的方法)

l = ['ball', 'field']
df.ids.apply(lambda x: all(y in x for y in l))

You could use np.logical_and.reduce and str.contains takes care of multiple words. 你可以使用np.logical_and.reducestr.contains来处理多个单词。

df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]

In [96]: df
Out[96]:
             ids
0  ball is field
1     ball is wa
2  doll is field

In [97]: df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]
Out[97]:
             ids
0  ball is field

Yet another RegEx approach: 另一种RegEx方法:

In [409]: df
Out[409]:
               ids
0   ball and field
1  ball, just ball
2      field alone
3  field and ball

In [410]: pat = r'(?:ball.*field|field.*ball)'

In [411]: df[df['ids'].str.contains(pat)]
Out[411]:
               ids
0   ball and field
3  field and ball

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM