Given a stacked DataFrame like this, in which there are three types of observation for each variable:
ID Variable Value
0 1056 Run Score 89
1 1056 Run Rank 56
2 1056 Run Decile 8
3 1056 Swim Score 92
4 1056 Swim Rank 64
5 1056 Swim Decile 8
6 1056 Cycle Score 96
7 1056 Cycle Rank 32
8 1056 Cycle Decile 9
How can I unstack it to be like this:
Variable ID Decile Rank Score Event
0 1056 8 56 89 Run
0 1056 8 64 92 Swim
0 1056 9 32 96 Cycle
This is how I'm currently doing it, but it feels over-complicated:
import pandas as pd
data = [(1056, "Run Score", 89),
(1056, "Run Rank", 56),
(1056, "Run Decile", 8),
(1056, "Swim Score", 92),
(1056, "Swim Rank", 64),
(1056, "Swim Decile", 8),
(1056, "Cycle Score", 96),
(1056, "Cycle Rank", 32),
(1056, "Cycle Decile", 9)]
cols = ["ID", "Variable", "Value"]
all_data = pd.DataFrame(data=data, columns=cols)
event_names = ["Run", "Swim", "Cycle"]
event_data_all = []
for event_name in event_names:
event_data = all_data.loc[all_data["Variable"].str.startswith(event_name)]
event_data = event_data.pivot_table(index="ID", columns="Variable", values="Value", aggfunc=pd.np.sum)
event_data.reset_index(inplace=True)
event_data.rename(columns={
event_name + " Score": "Score",
event_name + " Rank": "Rank",
event_name + " Decile": "Decile"
}, inplace=True)
event_data["Event"] = event_name
event_data_all.append(event_data)
all_data_final = pd.concat(event_data_all)
Is there a better way?
Idea is create new 2 columns and use them for pivoting by split
:
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
all_data[['Event','b']] = all_data['Variable'].str.split(expand=True)
df=all_data.set_index(['ID','Event','b'])['Value'].unstack().reset_index().rename_axis(None,1)
print (df)
ID Event Decile Rank Score
0 1056 Cycle 9 32 96
1 1056 Run 8 56 89
2 1056 Swim 8 64 92
Thanks @asongtoruin for another solution, especially if need aggregate data:
all_data.pivot_table(index=['ID', 'Event'],
columns='b',
values='Value',
aggfunc='sum').reset_index().rename_axis(None, 1))
Another solution is extract
by event_names
:
event_names = ["Run", "Swim", "Cycle"]
all_data = all_data.loc[all_data["Variable"].str.startswith(tuple(event_names))]
pat = '(' + '|'.join(event_names) + ')\s+(.*)'
all_data[['Event','b']] = all_data['Variable'].str.extract(pat)
df = (all_data.pivot_table(index=['ID', 'Event'],
columns='b',
values='Value',
aggfunc='sum').reset_index().rename_axis(None, 1))
print (df)
ID Event Decile Rank Score
0 1056 Cycle 9 32 96
1 1056 Run 8 56 89
2 1056 Swim 8 64 92
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.