I have a pandas dataframe with data like this:
df:
item day time data
0 item_0 2012-12-02 00:00:01 0.81
1 item_0 2012-12-02 00:00:02 0.07
2 item_0 2012-12-03 00:00:00 0.84
3 item_1 2012-12-02 00:00:01 0.47
The combination of item+day+time are unique
I am trying to transform to:
item day time_0 time_1 time_2
0 item_0 2012-12-02 NaN 0.81 0.07
1 item_0 2012-12-03 0.84 NaN NaN
2 item_1 2012-12-02 NaN 0.47 ...
I have tried:
df_stage_1 = df.groupby(['item','day']).apply(lambda x: x['time'].tolist()).reset_index()
the code above produces a list but times are not aligned from 00:00:00 - I could just check the list and add and track the indexes (so can add Nan to value list at these indexes)
df_stage_1 = pd.DataFrame(df_stage_1.tolist(), )
the code above gives me a dataframe of (unaligned) time values, which I could align (see above) and append to dataframe created in step above, but I cant work out how to get values for dataframe in correct time aligned columns
You can use pd.pivot_table
:
res = df.pivot_table(index=['item', 'day'], columns='time',
values='data', aggfunc='first').reset_index()
print(res)
time item day 00:00:00 00:00:01 00:00:02
0 item_0 2012-12-02 NaN 0.81 0.07
1 item_0 2012-12-03 0.84 NaN NaN
2 item_1 2012-12-02 NaN 0.47 NaN
Another solution is set_index
, unstack
, reset_index
:
df.set_index(['item', 'day', 'time'])['data'].unstack().reset_index()
time item day 00:00:00 00:00:01 00:00:02
0 item_0 2012-12-02 NaN 0.81 0.07
1 item_0 2012-12-03 0.84 NaN NaN
2 item_1 2012-12-02 NaN 0.47 NaN
Remember that df.unstack
in pandas refers to the index: it unstacks the innermost level of the index and pivots it into the columns.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.