[英]Python : compare two list of strings in a Pandas dataframe
I would like to check if each word in the labels list exist in each list in the column 'bigrams'.我想检查标签列表中的每个单词是否存在于“bigrams”列中的每个列表中。
And if one these words exist in the bigram list, I would like to replace the label none by the word that exists.如果二元组列表中存在这些词,我想用存在的词替换 label none。
I tried to write two consecutive for loop but it doesn't work.我试图写两个连续的 for 循环,但它不起作用。 I also tried a comprehension list.
我还尝试了一个理解列表。
How can I do?我能怎么做?
You can use pd.Series.str.extract
您可以使用
pd.Series.str.extract
df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.extract(regex)
Output: Output:
df
bgrams label
0 [hello, goodbye] goodbye
1 [dog, cat] cat
2 [cow] NaN
For multiple matches, you can use pd.Series.str.findall
:对于多个匹配项,您可以使用
pd.Series.str.findall
:
df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye, cat] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.findall(regex)
Output: Output:
df
bgrams label
0 [hello, goodbye, cat] [goodbye, cat]
1 [dog, cat] [cat]
2 [cow] []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.