[英]Python : compare two list of strings in a Pandas dataframe
您可以使用pd.Series.str.extract
df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.extract(regex)
Output:
df
bgrams label
0 [hello, goodbye] goodbye
1 [dog, cat] cat
2 [cow] NaN
對於多個匹配項,您可以使用pd.Series.str.findall
:
df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye, cat] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.findall(regex)
Output:
df
bgrams label
0 [hello, goodbye, cat] [goodbye, cat]
1 [dog, cat] [cat]
2 [cow] []
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.