基于另一列的滚动平均值

Question

I have a dataframe df which looks like我有一个数据框 df 看起来像

time(float)时间（浮动）	value (float)值（浮点数）
10.45 10.45	10 10
10.50 10.50	20 20
10.55 10.55	25 25
11.20 11.20	30 30
11.44 11.44	20 20
12.30 12.30	30 30

I need help to calculate a new column called rolling_average_value which is basically the average value of that row and all the values 1 hour before that row such that the new dataframe looks like.我需要帮助来计算一个名为 rolling_average_value 的新列，它基本上是该行的平均值以及该行之前 1 小时的所有值，以便新数据框看起来像。

time(float)时间（浮动）	value (float)值（浮点数）	rolling_average_value滚动平均值
10.45 10.45	10 10	10 10
10.50 10.50	20 20	15 15
10.55 10.55	25 25	18.33 18.33
11.20 11.20	30 30	21.25 21.25
11.44 11.44	20 20	21 21
12.30 12.30	30 30	25 25

Note: This time column is a float column注意：这个时间列是一个浮点列

Answer 1

You can temporarily set a datetime index and apply rolling.mean :您可以临时设置日期时间索引并应用rolling.mean ：

# extract hours/minuts from float
import numpy as np
minutes, hours = np.modf(df['time(float)'])
hours = hours.astype(int)
minutes = minutes.mul(100).astype(int)
dt = pd.to_datetime(hours.astype(str)+minutes.astype(str), format='%H%M')

# perform rolling computation
df['rolling_mean'] = (df.set_axis(dt)
                        .rolling('1h')['value (float)']
                        .mean()
                        .set_axis(df.index)
                      )

output:输出：

   time(float)  value (float)  rolling_mean
0        10.45             10     10.000000
1        10.50             20     15.000000
2        10.55             25     18.333333
3        11.20             30     21.250000
4        11.44             20     21.000000
5        12.30             30     25.000000

Alternative to compute dt :计算dt的替代方法：

dt = pd.to_datetime(df['time(float)'].astype(str)
                      .str.replace('\d+', lambda x: x.group().zfill(2),
                                   regex=True),
                    format='%H.%M')

Answer 2

Assuming your data frame is sorted by time, you can also use a simple list comprehension to solve your problem.假设您的数据框按时间排序，您还可以使用简单的列表推导来解决您的问题。 Iterate over times and get all indices where the distance from the previous time values to the actual iteration value is less than one (meaning less than one hour) and slice the value column that was converted to an array by those indices.迭代times并获取从先前时间值到实际迭代值的距离小于一（意味着小于一小时）的所有索引，并通过这些索引对转换为数组的value列进行切片。 Then, you can just compute the mean of the sliced array:然后，您可以计算切片数组的平均值：

import pandas as pd
import numpy as np


df = pd.DataFrame(
    {"time": [10.45, 10.5, 10.55, 11.2, 11.44, 12.3],
    "value": [10, 20, 25, 30, 20, 30]}     
)

times = df["time"].values
values = df["value"].values

df["rolling_mean"] = [round(np.mean(values[np.where(times[i] - times[:i+1] < 1)[0]]), 2) for i in range(len(times))]

If your data frame is large, you can compile this loop in C/C++ too make it significantly faster:如果您的数据框很大，您可以在 C/C++ 中编译此循环，使其显着更快：

from numba import njit

@njit
def compute_rolling_mean(times, values):
    return [round(np.mean(values[np.where(times[i] - times[:i+1] < 1)[0]]), 2) for i in range(len(times))]
    
df["rolling_mean"] = compute_rolling_mean(df["time"].values, df["value"].values)

Output:输出：

    time    value  rolling_mean
0   10.45   10     10.00
1   10.50   20     15.00
2   10.55   25     18.33
3   11.20   30     21.25
4   11.44   20     21.00
5   12.30   30     25.00

基于另一列的滚动平均值

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-07-18 07:52:18

解决方案2
1 2022-07-18 08:43:46

基于另一列的滚动平均值

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-07-18 07:52:18

解决方案2 1 2022-07-18 08:43:46

解决方案1
1 已采纳 2022-07-18 07:52:18

解决方案2
1 2022-07-18 08:43:46