简体   繁体   中英

How to pivot a data frame created from group using pivot

  1. updated from vendorname to companyname

2.1 after apply option 1 from @mozway, got a new error saying ValueError: Index contains duplicate entries, cannot reshape .

2.1.1 update the data frame to reflect the error.(which is changing the score to good for both B and C under the same zip code that produce two ranks 1)

2.1.2 using "first" instead of "dense" solved the error

---updated above

why my index "zip" couldn't be found after Groupby?

I checked the type of my dataframe op ( pandas.core.frame.DataFrame ) which is as same as the example pandas documentation provided. Pandas example

df = pd.DataFrame({'zip': ["46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"], 
                   'goodbadscore': ["good","good","good","good","good","bad","bad","good","good","good","bad","good","good","good","good","good","bad","good"],
                   'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
                   'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
                   )
                   
print(df)
print(type(df))

### count vendor under zip and count good
op = df.groupby(['zip','companyname'], as_index=True).aggregate(
    countinzipcode=('companyname', 'count'),
    good=('goodbadscore', lambda s: s.eq('good').sum())
)

### calculate % on time = on time count / total count
op['ontime%'] = op['good'] / op['countinzipcode'] * 100

### total zip rank and rank within zip
op['rankingbyallzip'] = op['ontime%'].rank(ascending=False, method='dense').astype(int)
op['rankwithinzip'] = op.groupby("zip")["ontime%"].rank("dense", ascending=False)
print(op)
print(type(op))

### keyerror
op.pivot(index = "zip", columns="rankwithinzip", values="company")

If you do not want to affect your op dataframe, you can do:

op.reset_index().pivot(index = "zip", columns="rankwithinzip", values="companyname")

output:

rankwithinzip 1.0  2.0  3.0
zip                        
11236           C    B    A
46062           A  NaN  NaN

If you don't care about op , then use @Henry Ecker solution

I suggest to add .astype(int) to make your rankwithinzip column int type:

op['rankwithinzip'] = op.groupby("zip")["ontime%"].rank("dense", ascending=False).astype(int)

Thus your final table will be:

  zip   1   2   3
11236   C   B   A
46062   A       

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM