[英]how to count if a word occurs in a list for a dataframe?
I have the following dataframe with the layout and the following list:我有以下 dataframe 的布局和以下列表:
S/N Summary
1 government government spending spending
2 government money spending spending
list_1 = ['government', 'money', 'spending']
I would like to identify the unique number of words in my list that appears in the dataframe.我想确定出现在 dataframe 中的列表中的唯一单词数。
Expected Output.预期 Output。
S/N Summary List 1
1 government government spending spending 2
2 government money spending spending 3
Try this:尝试这个:
set_1 = set(list_1)
df['Summary'].str.split().map(lambda words: len(set_1.intersection(words)))
First we split the strings into lists of words, then for each list words
we compute the size of the set intersection, which effectively counts unique matches.首先,我们将字符串拆分为单词列表,然后对于每个列表
words
,我们计算集合交集的大小,这有效地计算了唯一匹配。
You can use sets instead of lists:您可以使用集合而不是列表:
set_1 = {'government', 'money', 'spending', 'example', 'example', 'example'}
# This returns a value of 4 because there are four unique words:
len(set_1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.