简体   繁体   English

根据另一个数据框的列值创建一个数据框

[英]Create a dataframe based on column values of another dataframe

I have a dataframe as 20000 X 50. Two of the columns are Date and Time (represented as hour). 我有一个数据框为20000 X50。两列是日期和时间(表示为小时)。 Remaining columns have observations of some parameters during the time. 其余列在这段时间内观察到一些参数。 What I am trying to achieve is create a new dataframe which averages all the remaining column values for every 3 hours per day and creates a an ID columns for this which can be numbers from 1 to 8. Each representing 3 hour range. 我要实现的目标是创建一个新的数据框,该框将每天每3小时的所有剩余列值取平均值,并为此创建一个ID列,该ID列可以是1到8之间的数字。每个ID列代表3个小时的范围。 I have enclosed an image of the source and what should be result. 我已经附上了源图像以及应该产生的结果。 Any help is very much appreciated. 很感谢任何形式的帮助。 Data 数据

Use groupby by column Date and column Hour created by sub by 1 and floordiv with add with aggregate mean : 使用groupby by列Date和column Hoursub by 1floordiv创建,并且add与合计mean

df['Hour'] = df['Hour'].sub(1).floordiv(3).add(1)
df = df.groupby(['Date', 'Hour'], as_index=False).mean()
print (df)
         Date  Hour      col1      col2      col3
0  05/01/2018     1  5.333333  5.333333  7.666667
1  05/01/2018     2  6.000000  6.000000  4.000000
2  06/01/2018     1  4.000000  6.333333  7.000000
3  06/01/2018     3  6.000000  6.000000  3.666667

Detail: 详情:

print (df['Hour'].sub(1).floordiv(3).add(1))
0    1
1    1
2    1
3    2
4    1
5    1
6    1
7    3
8    3
9    3
Name: Hour, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM