计算数据框中列表项的出现次数

Question

我有一个数据框和一个单词列表。 现在我想计算列表中所有单词在数据帧的每个单元格中出现的频率。

文本
这是一个测试句
另一个句子

list = ["this", "test", "break"]

结果：

文本	出现次数
这是一个测试句	2
另一个句子	0

我的代码不起作用：

df["occurence_count"] = [df["text"].count(x) for x in list]

Answer 1

也许你可以这样做：

a = ['this', 'test', 'break']  # 'list' shouldn't be used as a variable name

df['occurence_count'] = (
    df['text'].str.split().explode()
    .isin(set(a)).groupby(level=0).sum()
)
>>> df
                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0

Answer 2

你可以做：

import re
l = ['this', 'test', 'break']
s = set(l)
df['occurence_count'] =df['text'].apply(
            lambda x:len(set(re.split('\s+',x)).intersection(s)))

所以你把它们分成单词，得到一个集合，在你的列表中寻找一个交集，得到 len

（顺便说一句，不要使用list作为变量名，它是 python 中的关键字）

输出：

                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0

计算数据框中列表项的出现次数

问题描述

2 个解决方案

解决方案1
1 2022-07-11 21:30:23

解决方案2
0 2022-07-11 21:32:42

计算数据框中列表项的出现次数

问题描述

2 个解决方案

解决方案1 1 2022-07-11 21:30:23

解决方案2 0 2022-07-11 21:32:42

解决方案1
1 2022-07-11 21:30:23

解决方案2
0 2022-07-11 21:32:42