简体   繁体   中英

Group by the data based on the one columns and then build the rows with the values in data frame

I have a data frame with multiple id in column id. For each day, I have 5 time steps. (6:00, 6:15, 6:30, 6:45, 7:00) However, some days does not have 5. And I want to fill the missing value as Nan. . Let see the following example,

import pandas as pd
df = pd.DataFrame()
df['id'] =   [1, 1, 1, 1, 1, 2, 2, 2,3, 3]
df['val'] = [11, 10, 12, 3, 4, 5, 125, 45,31, -2]
df['date'] = ['2019-03-31 06:00:00','2019-03-31 06:15:00', '2019-03-31 06:30:00', '2019-03-31 06:45:00', '2019-03-31 07:00:00', '2019-03-31 06:00:00', '2019-03-31 06:30:00',
              '2019-03-31 06:45:00', '2019-03-31 06:00:00', '2019-03-31 06:15:00']
df

For example, for id=1 we have 5 time steps.

For id=2 , we have 3 time steps.

for id=3 , we have 2 time steps.

So,

I want to sticks values in one rows and add only the day of the time to that row.

Here is the desired output for my data frame:

在此处输入图像描述

Can you help me with this? Thank you so much.

One way using pandas.DataFrame.pivot :

df["dates"] = pd.to_datetime(df["date"]).dt.date
new_df = df.pivot(index=["id", "dates"], columns="date", values="val")
new_df.columns = [f"val{i+1}" for i in range(new_df.shape[1])]
new_df.reset_index()

Output:

   id       dates  val1  val2   val3  val4  val5
0   1  2019-03-31  11.0  10.0   12.0   3.0   4.0
1   2  2019-03-31   5.0   NaN  125.0  45.0   NaN
2   3  2019-03-31  31.0  -2.0    NaN   NaN   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM