简体   繁体   English

如果 dataframe 的列表中出现单词,如何计算?

[英]how to count if a word occurs in a list for a dataframe?

I have the following dataframe with the layout and the following list:我有以下 dataframe 的布局和以下列表:

S/N Summary
1   government government spending spending
2   government money spending spending 

list_1 = ['government', 'money', 'spending']

I would like to identify the unique number of words in my list that appears in the dataframe.我想确定出现在 dataframe 中的列表中的唯一单词数。

Expected Output.预期 Output。

S/N Summary                                    List 1
1   government government spending spending    2
2   government money spending spending         3

Try this:尝试这个:

set_1 = set(list_1)
df['Summary'].str.split().map(lambda words: len(set_1.intersection(words)))

First we split the strings into lists of words, then for each list words we compute the size of the set intersection, which effectively counts unique matches.首先,我们将字符串拆分为单词列表,然后对于每个列表words ,我们计算集合交集的大小,这有效地计算了唯一匹配。

You can use sets instead of lists:您可以使用集合而不是列表:

set_1 = {'government', 'money', 'spending', 'example', 'example', 'example'}

# This returns a value of 4 because there are four unique words:
len(set_1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM