简体   繁体   中英

Plotting number of occurrences of column value

I hope the title is accurate enough, I wasn't quite sure how to phrase it.

Anyhow, my problem is that I have a Pandas df which looks like the following:

                              Customer       Source  CustomerSource
0                                Apple            A             141
1                                Apple            B              36
2                            Microsoft            A             143
3                               Oracle            C             225
4                                  Sun            C             151

This is a df derived from a greater dataset, and the meaning the value of CustomerSource is that it's the accumulated sum of all occurrences of Customer and Source , for example, in this case there is 141 occurrences of Apple with Soure A and 225 of Customer Oracle with Source B and so on.

What I want to do with this, is I want to do a stacked barplot which gives me all Customer s on the x-axis and the values of CustomerSource stacked on top of each other on the y-axis. Similar to the below example. Any hints as to how I would proceed with this?

在此处输入图片说明

You can use pivot or unstack for reshape and then DataFrame.bar :

df.pivot('Customer','Source','CustomerSource').plot.bar(stacked=True)

df.set_index(['Customer','Source'])['CustomerSource'].unstack().plot.bar(stacked=True)

Or if duplicates in pairs Customer , Source use pivot_table or groupby with aggregate sum :

print (df)
    Customer Source  CustomerSource
0      Apple      A             141 <-same Apple, A
1      Apple      A             200 <-same Apple, A
2      Apple      B              36
3  Microsoft      A             143
4     Oracle      C             225
5        Sun      C             151

df = df.pivot_table(index='Customer',columns='Source',values='CustomerSource', aggfunc='sum')
print (df)
Source         A     B      C
Customer                     
Apple      341.0  36.0    NaN <-141 + 200 = 341
Microsoft  143.0   NaN    NaN
Oracle       NaN   NaN  225.0
Sun          NaN   NaN  151.0


df.pivot_table(index='Customer',columns='Source',values='CustomerSource', aggfunc='sum')
  .plot.bar(stacked=True)

df.groupby(['Customer','Source'])['CustomerSource'].sum().unstack().plot.bar(stacked=True)

Also is possible swap columns:

df.pivot('Customer','Source','CustomerSource').plot.bar(stacked=True)

G

df.pivot('Source', 'Customer','CustomerSource').plot.bar(stacked=True)

G1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM