简体   繁体   中英

How to use agg_func = 'All' in Python Pivot_table

I have an input dataframe like given below

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})

It looks like as shown below

在此处输入图片说明

I would like to convert my output to look like as shown below

在此处输入图片说明

This is what I tried based on response from SO forum

g = df.level_1.str[-2:]
df.pivot_table(index='person_id', columns=g, values='val3',aggfunc='first')

This provides only the first record of each group (or person) like shown below which is very close to my expected output.Basically I am trying to do sort of transformation/rearrange the way input data is shown

在此处输入图片说明

but I want to have all the records. In real data, I have more than 100k records and each person can be repeated multiple times.

When I use g.groupby(g).cumcount() as index, I get NA's for half of the records and I would like to avoid it. I don't know why this issue happens.

Any other way to approach this is also helpful. Can you help me get this output?

You're pretty close to have what you want.

The trick here is to add another index to your second level like this :

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})
g = df.level_1.str[-2:]
# Extracting level's number
df['lvl'] = df.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x))))
# Then you pivot with person_id and lvl
df = df.pivot_table(index=['person_id', 'lvl'], columns=g, values='val3', aggfunc='first')

Output should be :

            level_1     FR          te          ue
person_id   lvl             
        1   1       Fasting     11/4/2005   1.33
        2   1           Random      18/1/2007   4.63
            2           Fasting     18/1/2017   8.63

Then if you reset the level 1 index like this :

df.reset_index(level=1).drop("lvl", axis=1)

The output is :

level_1     FR          te          ue
person_id           
1           Fasting     11/4/2005   1.33
2           Random      18/1/2007   4.63
2           Fasting     18/1/2017   8.63

And there you go !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM