如果 dataframe 的列表中出现单词，如何计算？

Question

I have the following dataframe with the layout and the following list:我有以下 dataframe 的布局和以下列表：

S/N Summary
1   government government spending spending
2   government money spending spending 

list_1 = ['government', 'money', 'spending']

I would like to identify the unique number of words in my list that appears in the dataframe.我想确定出现在 dataframe 中的列表中的唯一单词数。

Expected Output.预期 Output。

S/N Summary                                    List 1
1   government government spending spending    2
2   government money spending spending         3

Answer 1

Try this:尝试这个：

set_1 = set(list_1)
df['Summary'].str.split().map(lambda words: len(set_1.intersection(words)))

First we split the strings into lists of words, then for each list words we compute the size of the set intersection, which effectively counts unique matches.首先，我们将字符串拆分为单词列表，然后对于每个列表words ，我们计算集合交集的大小，这有效地计算了唯一匹配。

Answer 2

You can use sets instead of lists:您可以使用集合而不是列表：

set_1 = {'government', 'money', 'spending', 'example', 'example', 'example'}

# This returns a value of 4 because there are four unique words:
len(set_1)

如果 dataframe 的列表中出现单词，如何计算？

问题描述

2 个解决方案

解决方案1
1 2021-01-16 11:11:38

解决方案2
0 2021-01-16 11:14:36

如果 dataframe 的列表中出现单词，如何计算？

问题描述

2 个解决方案

解决方案1 1 2021-01-16 11:11:38

解决方案2 0 2021-01-16 11:14:36

解决方案1
1 2021-01-16 11:11:38

解决方案2
0 2021-01-16 11:14:36