简体   繁体   English

根据两列(id和日期)按数据分组,然后使用数据框中的值构建行

[英]Group by the data based on the two column (id and date) and then build the rows with the values in data frame

I have a data frame with multiple id in column id.我在列 id 中有一个具有多个 id 的数据框。 For each day, I have 5 time steps.对于每一天,我有 5 个时间步长。 (6:00, 6:15, 6:30, 6:45, 7:00) However, some days does not have 5. And I want to fill the missing value as Nan.. Let see the following example, (6:00, 6:15, 6:30, 6:45, 7:00) 但是,有些日子没有 5。我想将缺失值填充为 Nan.. 让我们看下面的例子,

import pandas as pd
df = pd.DataFrame()
df['id'] =   [1, 1, 1, 1, 1, 2, 2, 2,3, 3, 1, 1]
df['val'] = [11, 10, 12, 3, 4, 5, 125, 45,31, -2,5,6]
df['date'] = ['2019-03-31 06:00:00','2019-03-31 06:15:00', '2019-03-31 06:30:00', '2019-03-31 06:45:00', '2019-03-31 07:00:00', '2019-03-31 06:00:00', '2019-03-31 06:30:00',
              '2019-03-31 06:45:00', '2019-03-31 06:00:00', '2019-03-31 06:15:00', '2019-04-1 06:00:00', '2019-04-1 06:15:00']

For example, for id=1 we have 5 time steps at time 2019-03-31 and two value for the 2019-04-01.例如,对于id=1 ,我们在时间 2019-03-31 有 5 个时间步长,在 2019-04-01 有两个值。

For id=2 , we have 3 time steps.对于id=2 ,我们有 3 个时间步长。

for id=3 , we have 2 time steps.对于id=3 ,我们有 2 个时间步长。

So,所以,

I want to sticks values in one rows and add only the day of the time to that row.我想将值粘贴在一行中,并且只将时间中的那一天添加到该行中。 My final df is as follow:我的最终df如下:

在此处输入图像描述

Now, I am using the following code which stick all the values to each other and create 7 columns.现在,我使用以下代码将所有值相互粘贴并创建 7 列。 But I want 5 columns.但我想要 5 列。

df["dates"] = pd.to_datetime(df["date"]).dt.date
new_df = df.pivot(index=["id", "dates"], columns="date", values="val")
new_df.columns = [f"val{i+1}" for i in range(new_df.shape[1])]
new_df.reset_index() 

Can you help me with that?你能帮我解决这个问题吗?

create columns based on the time, I just added one line and changed the pivot to include time, in your code.根据时间创建列,我只是在您的代码中添加了一行并将基准更改为包含时间。

you were grouping by 'date', that has both date and time and hence you end up with 7 columns.您按“日期”分组,它同时具有日期和时间,因此最终得到 7 列。

df["dates"] = pd.to_datetime(df["date"]).dt.date
df['time'] = pd.to_datetime(df["date"]).dt.time

new_df = df.pivot(index=["id", "dates"], columns="time", values="val")
new_df.columns = [f"val{i+1}" for i in range(new_df.shape[1])]
new_df.reset_index() 

    id  dates       val1    val2    val3    val4    val5
0   1   2019-03-31  11.0    10.0    12.0    3.0     4.0
1   1   2019-04-01  5.0     6.0     NaN     NaN     NaN
2   2   2019-03-31  5.0     NaN     125.0   45.0    NaN
3   3   2019-03-31  31.0    -2.0    NaN     NaN     NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM