[英]Python : compare two list of strings in a Pandas dataframe
您可以使用pd.Series.str.extract
df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.extract(regex)
Output:
df
bgrams label
0 [hello, goodbye] goodbye
1 [dog, cat] cat
2 [cow] NaN
对于多个匹配项,您可以使用pd.Series.str.findall
:
df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye, cat] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.findall(regex)
Output:
df
bgrams label
0 [hello, goodbye, cat] [goodbye, cat]
1 [dog, cat] [cat]
2 [cow] []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.