简体   繁体   中英

Format dataframe to equal length time-series format

I have a dataframe

time.   item.   value1.   value2
-----------------------------------
1       1       3           4
2       1       2           5
1       2       3           5
3       2       2           1
2       3       3           6
3       3       2           5

and I would like to transform it to the following

time.   item.   value1.   value2
-----------------------------------
1       1       3           4
2       1       2           5
3       1       nan         nan
1       2       3           5
2       2       nan         nan
3       2       2           1
1       3       nan         nan
2       3       3           6
3       3       2           5

where the time range is the same for all items and value1 and value2 are nans if not in the original dataframe. I have done some trials with outer join but without success.

Is there an easy way to do it?

You can set time , item as index and then use df.reindex with pd.MultiIndex.from_product

time = df['time'].unique()
item = df['item'].unique()
idx = pd.MultiIndex.from_product([item, time],names=['item', 'time']).swaplevel(0,1)
df.set_index(['time', 'item']).reindex(idx).reset_index()

   time  item  value1  value2
0     1     1     3.0     4.0
1     2     1     2.0     5.0
2     3     1     NaN     NaN
3     1     2     3.0     5.0
4     2     2     NaN     NaN
5     3     2     2.0     1.0
6     1     3     NaN     NaN
7     2     3     3.0     6.0
8     3     3     2.0     5.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM