如果列包含任何指定的部分字符串，Pandas Dataframe保留行

Question

I have a pandas data frame. 我有一个熊猫数据框。 Below is a sample table. 下面是一个示例表。

Event   Text
A       something/AWAIT TO SHIP hello          
B       13579
C       AWAITING SHIP
D       24613
E       nan

I want to only keep rows that contain the words "AWAIT TO SHIP" in the Text column or contains the string 13579 or 24613 in Text column. 我只想在“文本”列中保留包含单词“ AWAIT TO SHIP”或在“文本”列中包含字符串13579或24613的行。 Below is my desired table: 下面是我想要的表：

Event   Text
A       something/AWAIT TO SHIP hello          
B       13579
D       24613

Below is the code I tried: 下面是我尝试的代码：

df_STH001_2 = df_STH001[df_STH001['Text'].str.contains("AWAIT TO SHIP") == True | df_STH001['Text'].str.contains("13579") == True | df_STH001['Text'].str.contains("24613") == True]

Below is the error I get: 以下是我得到的错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer 1

You should not explicitly check == True , instead just use the call to contains . 您不应显式检查== True ，而应使用对contains的调用。

Here's your sample: 这是您的示例：

First, we define the sample dataframe: 首先，我们定义样本数据框：

df1 = pd.DataFrame(data=[
('A', 'something/AWAIT TO SHIP hello'),
('B', 13579),
('C', 'AWAITING SHIP'),
('D', 24613),
('E', np.nan)], columns=['Event', 'Text'])

Then I build an intermediate mask with your conditions: 然后，我根据您的情况构建一个中间蒙版：

In [18]: mask = df1.Text.str.contains('AWAIT TO SHIP') |    \
                df1.Text.str.contains('13579') | \
                df1.Text.str.contains('24613')

Now you can index the original dataframe using this mask. 现在，您可以使用此掩码为原始数据帧编制索引。

In [19]: df1.loc[mask]
Out[19]: 
  Event                           Text
0     A  something/AWAIT TO SHIP hello

如果列包含任何指定的部分字符串，Pandas Dataframe保留行

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-12 21:07:15

如果列包含任何指定的部分字符串，Pandas Dataframe保留行

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-12 21:07:15

解决方案1
1 已采纳 2018-03-12 21:07:15