简体   繁体   English

推断数据帧以计算 15 分钟和 30 分钟的平均值

[英]Extrapolating dataframes to calculate 15min and 30min averages

Suppose I have a dataframe(the time column has 3 min-windows and grouped by ID-A and ID-B) like this-假设我有一个这样的数据框(时间列有 3 个最小窗口并按 ID-A 和 ID-B 分组)-

ID-A  ID-B    time     sum   num
A       1   09:30:00    5     2
        1   09:33:00    8     2
        1   09:36:00    5     2
        2   09:36:00    10    3
        2   09:39:00    15    3
        2   09:42:00    2     3
B       1   09:30:00    10    2
        1   09:33:00    12    2
        1   09:36:00    5     2

I am trying to calculate 15min and 30min average of the sum divided by num.我正在尝试计算总和除以 num 的 15 分钟和 30 分钟平均值。 Reproducible version of my df-我的 df- 的可复制版本

import pandas as pd

data = {'time': ['09:30:00',
                 '09:33:00',
                 '09:36:00',
                 '09:36:00',
                 '09:39:00',
                 '09:42:00',
                 '09:30:00',
                 '09:33:00',
                 '09:36:00'],
         'sum': [5, 8, 5, 10, 15, 2, 10, 12, 5],
         'num': ['2', '2', '2', '3', '3', '3', '2', '2', '2']}
my_index = pd.MultiIndex.from_arrays([["A"]*6 + ["B"]*3, [1, 1, 1, 2, 2, 2, 1, 1, 1]], names=["ID-A", "ID-B"])
df = pd.DataFrame(data, index=my_index)

Note:- For 1 pair of ID-A and ID-B, the num is always the same.注意:- 对于 1 对 ID-A 和 ID-B,数字始终相同。

Desired Dataframe(grouped by ID-A and ID-B)-所需数据帧(按 ID-A 和 ID-B 分组)-

ID-A  ID-B    time     sum   num   15min   30min  
A       1   09:30:00    5     2     15      30  
            09:33:00    8     2     15      30  
            09:36:00    5     2     15      30  
        2   09:36:00    10    3     15      30  
            09:39:00    15    3     15      30  
            09:42:00    2     3     15      30  
B       1   09:30:00    10    2     22.5    45  
            09:33:00    12    2     22.5    45  
            09:36:00    5     2     22.5    45  

For Example - For ID-A -> A and ID-B -> 1, the total time data was available for only 9 minutes.例如 - 对于 ID-A -> A 和 ID-B -> 1,总时间数据仅可用 9 分钟。 So I did, (5+8+5)/9 = 18/9 = 2 for 1 minute.所以我做了,(5+8+5)/9 = 18/9 = 2 1 分钟。 It also has to be divided by num, so 2/2=1.它还必须除以 num,所以 2/2=1。 Therefore, for 15 minutes, it will be 15 and 30 for 30 minutes.因此,对于 15 分钟,将是 15 和 30 为 30 分钟。 There could be an instance where the time data is available for 15 or 30 minutes.可能存在时间数据可用 15 或 30 分钟的情况。 Then obviously, extrapolation is not required only normal calculations should happen.显然,不需要外推,只需要进行正常计算。

My approach- Since the maximum average I need is 30 minutes, I thought of extrapolating all the values first to 30 minutes so I don't have to care about whether I have all values present.我的方法 - 因为我需要的最大平均值是 30 分钟,所以我想先将所有值外推到 30 分钟,这样我就不必关心是否存在所有值。 Eventually I just want ID-A, ID-B, 15min and 30min columns only in my df but this will also work.最终我只想要我的 df 中的 ID-A、ID-B、15min 和 30min 列,但这也可以。

Looks like this would work?看起来这行得通?

# cast 'num' to float
df['num'] = df['num'].astype(float)

def add_cols(grp):
    # divide sum by 3xnum of rows, and then divide by 'num'
    multiple = grp['sum'].sum() / (3*len(grp)) / grp.iloc[0, -1]
    return grp.assign(**{'15min': 15 * multiple, '30min': 30 * multiple})

df.groupby(['ID-A', 'ID-B']).apply(add_cols)

Output: Output:

                       time   sum   num     15min   30min
ID-A ID-B                   
A   1   2022-09-18 09:30:00     5   2.0     15.0    30.0
1       2022-09-18 09:33:00     8   2.0     15.0    30.0
1       2022-09-18 09:36:00     5   2.0     15.0    30.0
2       2022-09-18 09:36:00     10  3.0     15.0    30.0
2       2022-09-18 09:39:00     15  3.0     15.0    30.0
2       2022-09-18 09:42:00     2   3.0     15.0    30.0
B   1   2022-09-18 09:30:00     10  2.0     22.5    45.0
1       2022-09-18 09:33:00     12  2.0     22.5    45.0
1       2022-09-18 09:36:00     5   2.0     22.5    45.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 将 CSV 列舍入到最接近的 30 分钟 - Python - Round CSV column to nearest 30min NumPy数组15分钟值-每小时平均值 - Numpy array 15min values - hourly mean values 如何实现“如果错误等待 15 分钟然后继续”? - how to implement an "If error wait for 15min then continue"? Python Pandas 对数据点之间的平均值进行上采样(15 分钟到 1 分钟) - Python Pandas Upsampling on average values between data points (15min to 1min) 如何在不使用最小/最大/总和或平均值的情况下将 dataframe 的日期时间值分配给下一个 15 分钟时间步长? - How to asign Datetime values of a dataframe to the next 15min Timestep without using min/max/sum or mean? 将时间戳汇总到15分钟(以小时为单位),并找到熊猫中多列的总和,平均和最大值 - Aggregate to 15min based timestamp to hour and find sum, avg and max for multiple columns in pandas 如何使用时间间隔设置xaxis,例如使用matplotlib设置15分钟 - how to set the xaxis in time interval, say 15min using matplotlib 需要帮助让 pandas 识别我的时间序列 dataframe 间隔为 15 分钟 - Need help getting pandas to recognize my timeseries dataframe is in 15min intervals 在 python 中,如何计算 3 个或更多数据帧的 MAX 和 MIN - in python , how to calculate MAX and MIN for 3 or more dataframes Python和Pandas:如何在不同分辨率上向上/向下舍入unix时间戳(utc):1min-5min-15min-30min-1H-1D? - Python and Pandas: How to round Up/Down unix timestamp (utc) on different resolutions: 1min-5min-15min-30min-1H-1D?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM