简体   繁体   中英

How do I create a Pivot table in Python for two categorical variables?

My data looks smth like:

Index Job Y Balance
1 A Yes 1
2 B No 2
3 A No 5
4 A No 0
5 B Yes 4

I want to summarize the data in the following format, with job in the row and Y in the column:

Yes No
A 1 2
B 1 1

I have tried the following code:

pivot = df.pivot_table(index =['job'], columns = ['y'], values = ['balance'], aggfunc ='count')

I am not able to run the pivot without using balance in the value parameter. How do I get the above result?

You can try this:

data = {'Index': [1, 2, 3, 4, 5],
'job': ['A', 'B', 'A', 'A', 'B'],
'y': ['Yes', 'No', 'No', 'No', 'Yes'],
'Balance': [1, 2, 5, 0, 4]}
df = pd.DataFrame(data)

pivot = df.groupby(['job', 'y']).size().unstack(fill_value=0)

print(pivot)

To be able to do this, you will need to do df.groupby() first, to group the data on Job and Y columns to get the count of yes/no using the below code:

df2 = df.groupby(['Job', 'Y'], as_index=False).count()

  Job    Y  Balance
0   A   No        2
1   A  Yes        1
2   B   No        1
3   B  Yes        1

You can then use df2.pivot() to pivot this grouped table into the desired format:

df2.pivot(index='Job', columns='Y', values='Balance')

Y    No  Yes
Job         
A     2    1
B     1    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM