简体   繁体   English

基于另一列的滚动平均值

[英]Rolling average based on another column

I have a dataframe df which looks like我有一个数据框 df 看起来像

time(float)时间(浮动) value (float)值(浮点数)
10.45 10.45 10 10
10.50 10.50 20 20
10.55 10.55 25 25
11.20 11.20 30 30
11.44 11.44 20 20
12.30 12.30 30 30

I need help to calculate a new column called rolling_average_value which is basically the average value of that row and all the values 1 hour before that row such that the new dataframe looks like.我需要帮助来计算一个名为 rolling_average_value 的新列,它基本上是该行的平均值以及该行之前 1 小时的所有值,以便新数据框看起来像。

time(float)时间(浮动) value (float)值(浮点数) rolling_average_value滚动平均值
10.45 10.45 10 10 10 10
10.50 10.50 20 20 15 15
10.55 10.55 25 25 18.33 18.33
11.20 11.20 30 30 21.25 21.25
11.44 11.44 20 20 21 21
12.30 12.30 30 30 25 25

Note: This time column is a float column注意:这个时间列是一个浮点列

You can temporarily set a datetime index and apply rolling.mean :您可以临时设置日期时间索引并应用rolling.mean

# extract hours/minuts from float
import numpy as np
minutes, hours = np.modf(df['time(float)'])
hours = hours.astype(int)
minutes = minutes.mul(100).astype(int)
dt = pd.to_datetime(hours.astype(str)+minutes.astype(str), format='%H%M')

# perform rolling computation
df['rolling_mean'] = (df.set_axis(dt)
                        .rolling('1h')['value (float)']
                        .mean()
                        .set_axis(df.index)
                      )

output:输出:

   time(float)  value (float)  rolling_mean
0        10.45             10     10.000000
1        10.50             20     15.000000
2        10.55             25     18.333333
3        11.20             30     21.250000
4        11.44             20     21.000000
5        12.30             30     25.000000

Alternative to compute dt :计算dt的替代方法:

dt = pd.to_datetime(df['time(float)'].astype(str)
                      .str.replace('\d+', lambda x: x.group().zfill(2),
                                   regex=True),
                    format='%H.%M')

Assuming your data frame is sorted by time, you can also use a simple list comprehension to solve your problem.假设您的数据框按时间排序,您还可以使用简单的列表推导来解决您的问题。 Iterate over times and get all indices where the distance from the previous time values to the actual iteration value is less than one (meaning less than one hour) and slice the value column that was converted to an array by those indices.迭代times并获取从先前时间值到实际迭代值的距离小于一(意味着小于一小时)的所有索引,并通过这些索引对转换为数组的value列进行切片。 Then, you can just compute the mean of the sliced array:然后,您可以计算切片数组的平均值:

import pandas as pd
import numpy as np


df = pd.DataFrame(
    {"time": [10.45, 10.5, 10.55, 11.2, 11.44, 12.3],
    "value": [10, 20, 25, 30, 20, 30]}     
)

times = df["time"].values
values = df["value"].values

df["rolling_mean"] = [round(np.mean(values[np.where(times[i] - times[:i+1] < 1)[0]]), 2) for i in range(len(times))]

If your data frame is large, you can compile this loop in C/C++ too make it significantly faster:如果您的数据框很大,您可以在 C/C++ 中编译此循环,使其显着更快:

from numba import njit

@njit
def compute_rolling_mean(times, values):
    return [round(np.mean(values[np.where(times[i] - times[:i+1] < 1)[0]]), 2) for i in range(len(times))]
    
df["rolling_mean"] = compute_rolling_mean(df["time"].values, df["value"].values)

Output:输出:

    time    value  rolling_mean
0   10.45   10     10.00
1   10.50   20     15.00
2   10.55   25     18.33
3   11.20   30     21.25
4   11.44   20     21.00
5   12.30   30     25.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于 DataFrame 中另一列的列的滚动总和 - Rolling Sum of a column based on another column in a DataFrame 尝试使用 Pandas 数据框中其他两列的 groupby 基于另一列创建新的滚动平均列时出错 - Error when trying to create new rolling average column based on another column using groupby of two other columns in pandas data frame 根据另一列中的观察开始滚动求和 - Begin a rolling sum based on an observation in another column 基于另一列条件的 dask 滚动总和 - dask rolling sum based on condition on another column 熊猫计算加权平均滚动平均值并将其应用于另一列 - Pandas calculate and apply weighted rolling average on another column 一列的平均值,基于另一列的相等条件 - Average of a column, based on equals condition of another column 基于另一列平均一个 python dataframe 列 - Average a python dataframe column based on another column 一列的平均值基于另一列的值 - Average values of one column based on the values of another 基于另一个数据框将值从一列滚动到另一列 - Rolling over values from one column to other based on another dataframe 基于另一列的熊猫滚动第二个最高值 - Pandas Rolling second Highest Value based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM