Pandas 添加列的 value_counts

Question

我得到的数据框如下：

sentence   userid  topic
hello        1001  smalltalk
hi           1002  smalltalk
hello        1002  smalltalk
how are you? 1003  question
hello        1004  smalltalk
what is new? 1005  question
hi           1006  smalltalk
hello        1007  smalltalk

借助 Pandas value_counts

输入：

df['sentence'].value_counts()

Output：

hello 4
hi 2
how are you? 1
what is new? 1

我真正希望得到的是相同的值计数，旁边添加了一个特定的列：

hello 4 smalltalk
hi 2 smalltalk
how are you? 1 question
what is new? 1 question

Answer 1

IIUC，OP 需要使用pandas模块以 DataFrame 的形式保留df[['sentence', 'topic']].value_counts()的感兴趣结果，以进行进一步的操作\visulaizataions。 因此，这可以通过groupby()并在 DataFrame 内的新列count下聚合感兴趣的多个变量\列的计数来实现：

import pandas as pd

#Generate dataframe
df = pd.DataFrame({'userid':    [1001, 1002, 1002, 1003, 1004, 1005, 1006, 1007],    
                    'sentence': ['hello', "hi", 'hello', "how are you?", 'hello', "what is new?", "hi", 'hello'],    
                    'topic':    ["smalltalk", "smalltalk", "smalltalk", "question", "smalltalk", "question", "smalltalk", "smalltalk"],
                    })

#Aggregate counts with respect to interested columns in df
df2 = df.groupby(["sentence","topic"])["topic"].agg(["count"]) \
        .reset_index() \
        .drop_duplicates() #remove duplicates

print(df2) 
#       sentence      topic  count
#0         hello  smalltalk      4
#1            hi  smalltalk      2
#2  how are you?   question      1
#3  what is new?   question      1

Answer 2

mozway在评论中提供了解决方案：

df[['sentence', 'topic']].value_counts()

Pandas 添加列的 value_counts

问题描述

2 个解决方案

解决方案1
1 已采纳 2023-01-25 01:26:18

解决方案2
0 2023-01-24 21:59:21

Pandas 添加列的 value_counts

问题描述

2 个解决方案

解决方案1 1 已采纳 2023-01-25 01:26:18

解决方案2 0 2023-01-24 21:59:21

解决方案1
1 已采纳 2023-01-25 01:26:18

解决方案2
0 2023-01-24 21:59:21