简体   繁体   中英

Un-stacking a Pandas DataFrame with multiple types of observation for each variable

Given a stacked DataFrame like this, in which there are three types of observation for each variable:

     ID      Variable  Value
0  1056    Run Score   89
1  1056    Run Rank    56
2  1056    Run Decile  8
3  1056    Swim Score  92
4  1056    Swim Rank   64
5  1056    Swim Decile 8
6  1056    Cycle Score 96
7  1056    Cycle Rank  32
8  1056    Cycle Decile    9

How can I unstack it to be like this:

Variable    ID  Decile  Rank  Score  Event
0         1056       8    56     89    Run
0         1056       8    64     92   Swim
0         1056       9    32     96  Cycle

This is how I'm currently doing it, but it feels over-complicated:

import pandas as pd

data = [(1056, "Run Score", 89),
    (1056, "Run Rank", 56),
    (1056, "Run Decile", 8),
    (1056, "Swim Score", 92),
    (1056, "Swim Rank", 64),
    (1056, "Swim Decile", 8),
    (1056, "Cycle Score", 96),
    (1056, "Cycle Rank", 32),
    (1056, "Cycle Decile", 9)]

cols = ["ID", "Variable", "Value"]

all_data = pd.DataFrame(data=data, columns=cols)

event_names = ["Run", "Swim", "Cycle"]

event_data_all = []

for event_name in event_names:
    event_data = all_data.loc[all_data["Variable"].str.startswith(event_name)]
    event_data = event_data.pivot_table(index="ID", columns="Variable", values="Value", aggfunc=pd.np.sum)
    event_data.reset_index(inplace=True)
    event_data.rename(columns={
        event_name + " Score": "Score",
        event_name + " Rank": "Rank",
        event_name + " Decile": "Decile"
    }, inplace=True)
    event_data["Event"] = event_name
    event_data_all.append(event_data)

all_data_final = pd.concat(event_data_all)

Is there a better way?

Idea is create new 2 columns and use them for pivoting by split :

all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
all_data[['Event','b']] = all_data['Variable'].str.split(expand=True)

df=all_data.set_index(['ID','Event','b'])['Value'].unstack().reset_index().rename_axis(None,1)
print (df)
     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92

Thanks @asongtoruin for another solution, especially if need aggregate data:

all_data.pivot_table(index=['ID', 'Event'], 
                     columns='b',
                     values='Value', 
                     aggfunc='sum').reset_index().rename_axis(None, 1))

Another solution is extract by event_names :

event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)

df = (all_data.pivot_table(index=['ID', 'Event'], 
                          columns='b', 
                          values='Value', 
                          aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)

     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM