str.contains only 和精确值

Question

I have the following list :我有以下清单：

personnages = ['Stanley','Kevin', 'Franck']

I want to use str.contains function to create a new pandas dataframe df3 :我想使用 str.contains 函数来创建一个新的 Pandas 数据框 df3 ：

df3 = df2[df2['speaker'].str.contains('|'.join(personnages))]

However, if the row of the column speaker contains : 'Stanley & Kevin', i don't want it in df3.但是，如果列扬声器的行包含：'Stanley & Kevin'，我不希望它出现在 df3 中。

How can I improve my code to do this ?我怎样才能改进我的代码来做到这一点？

Answer 1

Here what I would do:在这里我会做什么：

# toy data
df =  pd.DataFrame({'speaker':['Stanley & Kevin', 'Everybody', 
                               'Kevin speaks', 'The speaker is Franck', 'Nobody']})

personnages = ['Stanley','Kevin', 'Franck']

pattern = '|'.join(personnages)
s = (df['speaker'].str
       .extractall(f'({pattern})')  # extract all personnages
       .groupby(level=0)[0]         # group by df's row
       .nunique().eq(1)             # count the unique number
    )
df.loc[s.index[s]]

Output:输出：

                 speaker
2           Kevin speaks
3  The speaker is Franck

Answer 2

You'll want to denote line start and end in your regex, that way it only contains the single name:您需要在正则表达式中表示行开始和结束，这样它只包含单个名称：

import pandas as pd

speakers = ['Stanley', 'Kevin', 'Frank', 'Kevin & Frank']
df = pd.DataFrame([{'speaker': speaker} for speaker in speakers])
         speaker
0        Stanley
1          Kevin
2          Frank
3  Kevin & Frank


r = '|'.join(speakers[:-1]) # gets all but the last one for the sake of example

# the ^ marks start of string, and $ is the end
df[df['speaker'].str.contains(f'^({r})$')]
   speaker
0  Stanley
1    Kevin
2    Frank

str.contains only 和精确值

问题描述

2 个解决方案

解决方案1
2 2019-12-20 15:19:24

解决方案2
0 已采纳 2019-12-20 15:19:09

str.contains only 和精确值

问题描述

2 个解决方案

解决方案1 2 2019-12-20 15:19:24

解决方案2 0 已采纳 2019-12-20 15:19:09

解决方案1
2 2019-12-20 15:19:24

解决方案2
0 已采纳 2019-12-20 15:19:09