計算數據框中列表項的出現次數

Question

我有一個數據框和一個單詞列表。 現在我想計算列表中所有單詞在數據幀的每個單元格中出現的頻率。

文本
這是一個測試句
另一個句子

list = ["this", "test", "break"]

結果：

文本	出現次數
這是一個測試句	2
另一個句子	0

我的代碼不起作用：

df["occurence_count"] = [df["text"].count(x) for x in list]

Answer 1

也許你可以這樣做：

a = ['this', 'test', 'break']  # 'list' shouldn't be used as a variable name

df['occurence_count'] = (
    df['text'].str.split().explode()
    .isin(set(a)).groupby(level=0).sum()
)
>>> df
                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0

Answer 2

你可以做：

import re
l = ['this', 'test', 'break']
s = set(l)
df['occurence_count'] =df['text'].apply(
            lambda x:len(set(re.split('\s+',x)).intersection(s)))

所以你把它們分成單詞，得到一個集合，在你的列表中尋找一個交集，得到 len

（順便說一句，不要使用list作為變量名，它是 python 中的關鍵字）

輸出：

                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0

計算數據框中列表項的出現次數

問題描述

2 個解決方案

解決方案1
1 2022-07-11 21:30:23

解決方案2
0 2022-07-11 21:32:42

計算數據框中列表項的出現次數

問題描述

2 個解決方案

解決方案1 1 2022-07-11 21:30:23

解決方案2 0 2022-07-11 21:32:42

解決方案1
1 2022-07-11 21:30:23

解決方案2
0 2022-07-11 21:32:42