[英]Pandas value_counts with added column
我得到的数据框如下:
sentence userid topic
hello 1001 smalltalk
hi 1002 smalltalk
hello 1002 smalltalk
how are you? 1003 question
hello 1004 smalltalk
what is new? 1005 question
hi 1006 smalltalk
hello 1007 smalltalk
借助 Pandas value_counts
输入:
df['sentence'].value_counts()
Output:
hello 4
hi 2
how are you? 1
what is new? 1
我真正希望得到的是相同的值计数,旁边添加了一个特定的列:
hello 4 smalltalk
hi 2 smalltalk
how are you? 1 question
what is new? 1 question
IIUC,OP 需要使用pandas模块以 DataFrame 的形式保留df[['sentence', 'topic']].value_counts()
的感兴趣结果,以进行进一步的操作\visulaizataions。 因此,这可以通过groupby()
并在 DataFrame 内的新列count
下聚合感兴趣的多个变量\列的计数来实现:
import pandas as pd
#Generate dataframe
df = pd.DataFrame({'userid': [1001, 1002, 1002, 1003, 1004, 1005, 1006, 1007],
'sentence': ['hello', "hi", 'hello', "how are you?", 'hello', "what is new?", "hi", 'hello'],
'topic': ["smalltalk", "smalltalk", "smalltalk", "question", "smalltalk", "question", "smalltalk", "smalltalk"],
})
#Aggregate counts with respect to interested columns in df
df2 = df.groupby(["sentence","topic"])["topic"].agg(["count"]) \
.reset_index() \
.drop_duplicates() #remove duplicates
print(df2)
# sentence topic count
#0 hello smalltalk 4
#1 hi smalltalk 2
#2 how are you? question 1
#3 what is new? question 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.