计算 pandas 列中单词的频率并计算另一列

Question

我有一个带有评论及其标签的数据框。

注释	Label
我爱我的队友	积极的
我们需要更高的薪水	建议
我讨厌我的老板	消极的

我想得到一个 output 像

单词	数数	积极的	消极的	建议
我	2	1	1	0
我的	2	1	1	0
队友	1	1	0	0
爱	1	1	0	0
我们	1	0	0	1
需要	1	0	0	1
更高	1	0	0	1
支付	1	0	0	1
恨	1	0	1	0
老板	1	0	1	0

我能够通过使用获得字数

df.Comments.str.split(expand=True).stack().value_counts()

但我无法获得 label 计数。 任何帮助将不胜感激！

Answer 1

您可以执行以下操作

out = (
    df.assign(Word=lambda df: df.Comments.str.split())   # Create a column 'World' with the list of words 
      .explode('Word')  # explode the list of words into new rows 
      .pipe(lambda df: pd.crosstab(df.Word, df.Label)) # cross table/ pivot table between 'Word' and 'Label' columns
      .assign(Count=lambda df: df.sum(axis=1))   # Count the column's total
      .reset_index()  # 'Word' index to column
      .rename_axis(columns=None) # remove the name ('Label') of the columns axis
)

Output：

>>> out 

        Word  Negative  Positive  Suggestions  Count
0          I         1         1            0      2
1  Teammates         0         1            0      1
2         We         0         0            1      1
3       boss         1         0            0      1
4       hate         1         0            0      1
5     higher         0         0            1      1
6       love         0         1            0      1
7         my         1         1            0      2
8       need         0         0            1      1
9        pay         0         0            1      1

Answer 2

您可以使用：

out = (
    df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1) \
                  .pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0) \
                  .assign(Count=lambda x: x.sum(axis=1))
)

Output：

>>> out
Label      Negative  Positive  Suggestions  Count
Word                                             
I                 1         1            0      2
Teammates         0         1            0      1
We                0         0            1      1
boss              1         0            0      1
hate              1         0            0      1
higher            0         0            1      1
love              0         1            0      1
my                1         1            0      2
need              0         0            1      1
pay               0         0            1      1

计算 pandas 列中单词的频率并计算另一列

问题描述

2 个解决方案

解决方案1
1 2021-12-14 18:14:38

解决方案2
0 2021-12-14 18:07:05

计算 pandas 列中单词的频率并计算另一列

问题描述

2 个解决方案

解决方案1 1 2021-12-14 18:14:38

解决方案2 0 2021-12-14 18:07:05

解决方案1
1 2021-12-14 18:14:38

解决方案2
0 2021-12-14 18:07:05