简体   繁体   中英

Create separate crosstab in pandas DataFrame for each category in a column

I have a pandas DataFrame df which looks like this (Sample):

df = pd.DataFrame({'Time' : ['Low','High','Medium','Low','Low','High','Low','High','Medium'],
                   'Sentiment' : ['Pos','Neg','Neg','Pos','Pos','Neg','Neg','Pos','Neg'],
                   'Product' : ['P1','P1','P1','P2','P1','P2','P2','P2','P1']})

My question is simple (for which I searched before posting), I need to create a crosstab which can be created easily with this command:

pd.crosstab(df['Time'],df['Sentiment'])

And gives the desired output:

Sentiment  Neg  Pos
Time               
High         2    1
Low          1    3
Medium       2    0

Now there is one more column in the data called Product which has around 50 unique categories (for sample I have included 2) and I need to create similar crosstab objects for all the categories, how can I do that?

My expected output is shown below:

Crosstab for Product1 (P1):

P1

Sentiment  Neg  Pos
Time
High         1    0
Low          0    2
Medium       2    0

Crosstab for Product1 (P2):

P2

Sentiment  Neg  Pos
Time
High         1    1
Low          1    1
Medium       0    0

How can I get 50 crosstab in one go?

Use crosstab with list of both columns - so get MultiIndex :

df1 = pd.crosstab([df['Product'], df['Time']],df['Sentiment'])
print (df1)
Sentiment       Neg  Pos
Product Time            
P1      High      1    0
        Low       0    2
        Medium    2    0
P2      High      1    1
        Low       1    1

For add all possible values in MultiIndex add DataFrame.unstack and DataFrame.stack :

df1 = pd.crosstab([df['Product'], df['Time']],df['Sentiment']).unstack(fill_value=0).stack()
print (df1)
Sentiment       Neg  Pos
Product Time            
P1      High      1    0
        Low       0    2
        Medium    2    0
P2      High      1    1
        Low       1    1
        Medium    0    0

And then select by first level by DataFrame.loc :

print (df1.loc['P1'])
Sentiment  Neg  Pos
Time               
High         1    0
Low          0    2
Medium       2    0 


print (df1.loc['P2'])
Sentiment  Neg  Pos
Time               
High         1    1
Low          1    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM