繁体   English   中英

Pandas 添加列的 value_counts

[英]Pandas value_counts with added column

我得到的数据框如下:

sentence   userid  topic
hello        1001  smalltalk
hi           1002  smalltalk
hello        1002  smalltalk
how are you? 1003  question
hello        1004  smalltalk
what is new? 1005  question
hi           1006  smalltalk
hello        1007  smalltalk

借助 Pandas value_counts

输入:

df['sentence'].value_counts()

Output:

hello 4
hi 2
how are you? 1
what is new? 1

我真正希望得到的是相同的值计数,旁边添加了一个特定的列:

hello 4 smalltalk
hi 2 smalltalk
how are you? 1 question
what is new? 1 question

IIUC,OP 需要使用模块以 DataFrame 的形式保留df[['sentence', 'topic']].value_counts()的感兴趣结果,以进行进一步的操作\visulaizataions。 因此,这可以通过groupby()并在 DataFrame 内的新列count下聚合感兴趣的多个变量\列的计数来实现:

import pandas as pd

#Generate dataframe
df = pd.DataFrame({'userid':    [1001, 1002, 1002, 1003, 1004, 1005, 1006, 1007],    
                    'sentence': ['hello', "hi", 'hello', "how are you?", 'hello', "what is new?", "hi", 'hello'],    
                    'topic':    ["smalltalk", "smalltalk", "smalltalk", "question", "smalltalk", "question", "smalltalk", "smalltalk"],
                    })

#Aggregate counts with respect to interested columns in df
df2 = df.groupby(["sentence","topic"])["topic"].agg(["count"]) \
        .reset_index() \
        .drop_duplicates() #remove duplicates

print(df2) 
#       sentence      topic  count
#0         hello  smalltalk      4
#1            hi  smalltalk      2
#2  how are you?   question      1
#3  what is new?   question      1

mozway评论中提供了解决方案:

df[['sentence', 'topic']].value_counts()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM