[英]How to select every 4th row in a pandas dataframe and calculate the rolling average
I have a pandas dataframe that you can see in the screenshot.我有一个 pandas dataframe,您可以在屏幕截图中看到。 The dataframe has a time resolution of 15 minutes (it is generation data).
dataframe 的时间分辨率为 15 分钟(它是生成数据)。 I would like to reduce this time resolution to 1 hour meaning that I should take every 4th row and the value in every 4th row should be the anverage values of the last 4 rows (including this one).
我想将此时间分辨率减少到 1 小时,这意味着我应该每 4 行取一次,并且每 4 行中的值应该是最后 4 行(包括这一行)的平均值。 So it should be a rolling average with non-overlapping horizons.
所以它应该是一个不重叠的滚动平均值。
I tried the following for one column (wind offshore):我为一列(海上风)尝试了以下内容:
df_generation = pd.read_csv("C:/Users/Desktop/Data/generation_data.csv", sep =",")
df_generation_2 = df_generation
df_generation_2['Wind Offshore Average'] = df_generation_2['Wind Offshore'].rolling(4).mean()
But this is not what I really want.但这不是我真正想要的。 As you can see in the screenshot, my code just created a further column with the average of the last 4th entries for every timeslot.
正如您在屏幕截图中看到的那样,我的代码刚刚创建了另一列,其中包含每个时间段的最后 4 个条目的平均值。 Here the rolling average has overlapping horizons.
在这里,滚动平均值具有重叠的范围。 What I want is to have a new dataframe that only has an entry after every hour (after 4 timslots of the original array).
我想要的是有一个新的 dataframe 每小时只有一个条目(在原始阵列的 4 个 timslots 之后)。 Do you have an idea how I can do that?
你知道我该怎么做吗? I'd appreciate every comment.
我会很感激每一条评论。
From looking at your Index it looks like the.resample method is what you are looking for (with many examples for specific uses): https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html从您的索引看来,.resample 方法就是您正在寻找的方法(有许多特定用途的示例): https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.ZA2535FDC730D2EC6
as in如在
new = df_generation['Wind Offshore'].resample('1H').mean()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.