简体   繁体   中英

Pandas value_counts with added column

I got data frame as follow:

sentence   userid  topic
hello        1001  smalltalk
hi           1002  smalltalk
hello        1002  smalltalk
how are you? 1003  question
hello        1004  smalltalk
what is new? 1005  question
hi           1006  smalltalk
hello        1007  smalltalk

With the help of Pandas value_counts

Input:

df['sentence'].value_counts()

Output:

hello 4
hi 2
how are you? 1
what is new? 1

What I really wish to get is the same value count with the a specific column added next to it:

hello 4 smalltalk
hi 2 smalltalk
how are you? 1 question
what is new? 1 question

IIUC, OP needs to keep interested results of df[['sentence', 'topic']].value_counts() in the form of DataFrame using module for further manipulation\visulaizataions. So this can be achieved by groupby() and aggregating the counts of interested multiple variables\columns under a new column count within DataFrame:

import pandas as pd

#Generate dataframe
df = pd.DataFrame({'userid':    [1001, 1002, 1002, 1003, 1004, 1005, 1006, 1007],    
                    'sentence': ['hello', "hi", 'hello', "how are you?", 'hello', "what is new?", "hi", 'hello'],    
                    'topic':    ["smalltalk", "smalltalk", "smalltalk", "question", "smalltalk", "question", "smalltalk", "smalltalk"],
                    })

#Aggregate counts with respect to interested columns in df
df2 = df.groupby(["sentence","topic"])["topic"].agg(["count"]) \
        .reset_index() \
        .drop_duplicates() #remove duplicates

print(df2) 
#       sentence      topic  count
#0         hello  smalltalk      4
#1            hi  smalltalk      2
#2  how are you?   question      1
#3  what is new?   question      1

mozway provided the solution in a comment :

df[['sentence', 'topic']].value_counts()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM