Un-stacking a Pandas DataFrame with multiple types of observation for each variable

Question

Given a stacked DataFrame like this, in which there are three types of observation for each variable:

     ID      Variable  Value
0  1056    Run Score   89
1  1056    Run Rank    56
2  1056    Run Decile  8
3  1056    Swim Score  92
4  1056    Swim Rank   64
5  1056    Swim Decile 8
6  1056    Cycle Score 96
7  1056    Cycle Rank  32
8  1056    Cycle Decile    9

How can I unstack it to be like this:

Variable    ID  Decile  Rank  Score  Event
0         1056       8    56     89    Run
0         1056       8    64     92   Swim
0         1056       9    32     96  Cycle

This is how I'm currently doing it, but it feels over-complicated:

import pandas as pd

data = [(1056, "Run Score", 89),
    (1056, "Run Rank", 56),
    (1056, "Run Decile", 8),
    (1056, "Swim Score", 92),
    (1056, "Swim Rank", 64),
    (1056, "Swim Decile", 8),
    (1056, "Cycle Score", 96),
    (1056, "Cycle Rank", 32),
    (1056, "Cycle Decile", 9)]

cols = ["ID", "Variable", "Value"]

all_data = pd.DataFrame(data=data, columns=cols)

event_names = ["Run", "Swim", "Cycle"]

event_data_all = []

for event_name in event_names:
    event_data = all_data.loc[all_data["Variable"].str.startswith(event_name)]
    event_data = event_data.pivot_table(index="ID", columns="Variable", values="Value", aggfunc=pd.np.sum)
    event_data.reset_index(inplace=True)
    event_data.rename(columns={
        event_name + " Score": "Score",
        event_name + " Rank": "Rank",
        event_name + " Decile": "Decile"
    }, inplace=True)
    event_data["Event"] = event_name
    event_data_all.append(event_data)

all_data_final = pd.concat(event_data_all)

Is there a better way?

Answer 1

Idea is create new 2 columns and use them for pivoting by split :

all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
all_data[['Event','b']] = all_data['Variable'].str.split(expand=True)

df=all_data.set_index(['ID','Event','b'])['Value'].unstack().reset_index().rename_axis(None,1)
print (df)
     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92

Thanks @asongtoruin for another solution, especially if need aggregate data:

all_data.pivot_table(index=['ID', 'Event'], 
                     columns='b',
                     values='Value', 
                     aggfunc='sum').reset_index().rename_axis(None, 1))

Another solution is extract by event_names :

event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)

df = (all_data.pivot_table(index=['ID', 'Event'], 
                          columns='b', 
                          values='Value', 
                          aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)

     ID  Event  Decile  Rank  Score
0  1056  Cycle       9    32     96
1  1056    Run       8    56     89
2  1056   Swim       8    64     92

Un-stacking a Pandas DataFrame with multiple types of observation for each variable

Question

1 answers

solution1
3 ACCPTED 2018-08-15 09:07:27

Un-stacking a Pandas DataFrame with multiple types of observation for each variable

Question

1 answers

solution1 3 ACCPTED 2018-08-15 09:07:27

solution1
3 ACCPTED 2018-08-15 09:07:27