[英]str.contains only and exact value
I have the following list :我有以下清单:
personnages = ['Stanley','Kevin', 'Franck']
I want to use str.contains function to create a new pandas dataframe df3 :我想使用 str.contains 函数来创建一个新的 Pandas 数据框 df3 :
df3 = df2[df2['speaker'].str.contains('|'.join(personnages))]
However, if the row of the column speaker contains : 'Stanley & Kevin', i don't want it in df3.但是,如果列扬声器的行包含:'Stanley & Kevin',我不希望它出现在 df3 中。
How can I improve my code to do this ?我怎样才能改进我的代码来做到这一点?
Here what I would do:在这里我会做什么:
# toy data
df = pd.DataFrame({'speaker':['Stanley & Kevin', 'Everybody',
'Kevin speaks', 'The speaker is Franck', 'Nobody']})
personnages = ['Stanley','Kevin', 'Franck']
pattern = '|'.join(personnages)
s = (df['speaker'].str
.extractall(f'({pattern})') # extract all personnages
.groupby(level=0)[0] # group by df's row
.nunique().eq(1) # count the unique number
)
df.loc[s.index[s]]
Output:输出:
speaker
2 Kevin speaks
3 The speaker is Franck
You'll want to denote line start and end in your regex, that way it only contains the single name:您需要在正则表达式中表示行开始和结束,这样它只包含单个名称:
import pandas as pd
speakers = ['Stanley', 'Kevin', 'Frank', 'Kevin & Frank']
df = pd.DataFrame([{'speaker': speaker} for speaker in speakers])
speaker
0 Stanley
1 Kevin
2 Frank
3 Kevin & Frank
r = '|'.join(speakers[:-1]) # gets all but the last one for the sake of example
# the ^ marks start of string, and $ is the end
df[df['speaker'].str.contains(f'^({r})$')]
speaker
0 Stanley
1 Kevin
2 Frank
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.