简体   繁体   English

pandas 中两个时间序列的平均值

[英]Average of two time series in pandas

Ultimate goal description终极目标描述

My goal is to compute the average of two times series ( red and green ) stored in pandas DataFrame s.我的目标是计算存储在 pandas DataFrame中的两次系列( redgreen )的平均值。 However, while both time series have the same columns, they differ in precise time points.但是,虽然两个时间序列具有相同的列,但它们的精确时间点不同。 What I want to implement is a function average which computes average time series from the two given series such that if a value is missing for particular time point, it should be interpolated.我想要实现的是一个 function average ,它计算两个给定序列的平均时间序列,这样如果特定时间点缺少一个值,则应该对其进行插值。 For example:例如:

import pandas as pd
green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])

average_grey_df = pd.DataFrame({'A': [4, 2.7, 3.75, 5.5, 3, 4.5], 'B': [...]}, index= [1, 2, 3, 4, 5, 6])

assert average_grey_df == average(green_df, red_df)

It is obvious when displayed graphically (values shown for column A, but the same should be done with all columns; precise values are just illustrative):以图形方式显示时很明显(为 A 列显示的值,但所有列都应该这样做;精确值只是说明性的):

在此处输入图像描述

Approach方法

So far I was not able to find a completely working solution.到目前为止,我还没有找到一个完全可行的解决方案。 I was thinking about dividing it to three steps:我正在考虑将其分为三个步骤:

(1) extend both time series by time points from the other time series such that missing data are nan (1)从另一个时间序列的时间点扩展两个时间序列,使得缺失的数据是nan

                    A  | ...                    A | ...
                -------                     -------
                1 | 4 |                     1 | 4 |
                2 |nan|                     2 |2.5|
    red:        3 | 2 |         green:      3 |nan|
                4 |nan|                     4 | 8 |
                5 |nan|                     5 | 2 |
                6 | 5 |                     6 | 4 |

(2) fill the missing data by interpolating both dataframes (direct usage of dataframe interpolate method ) (3) finally compute average of these two time series as following: (2) 通过对两个数据帧进行插值来填充缺失的数据(直接使用dataframe 插值方法) (3) 最后计算这两个时间序列的平均值如下:

averages = (green_df.stack() + red_df.stack()) / 2
average_grey_df = averages.unstack()

Additionally, method dropna can be used to drop created nan s.此外,方法dropna可用于删除创建的nan Moreover, maybe there is a better method I haven't discovered.此外,也许还有更好的方法我还没有发现。

Question问题

I was not able to figure out how to compute part (1) at all.我根本不知道如何计算第 (1) 部分。 I checked methods like join , merge and concat with its various examples, but none of them seems to do the job.我用各种例子检查了joinmergeconcat等方法,但似乎没有一个能完成这项工作。 Any suggestions?有什么建议么? I am also open to other approaches.我也对其他方法持开放态度。

Thank you谢谢

You can merge the two dfs.您可以合并两个dfs。 From there, you can interpolate the NA values从那里,您可以插入 NA 值

green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])

combined_df = pd.merge(green_df, red_df, suffixes=('_green', '_red'), left_index=True, right_index=True, how='outer')
combined_df = combined_df.interpolate()
combined_df['A_avg'] = combined_df[["A_green", "A_red"]].mean(axis=1)
combined_df['B_avg'] = combined_df[["B_green", "B_red"]].mean(axis=1)

These can then be plotted using .plot() :然后可以使用.plot()绘制这些:

combined_df[['A_green', 'A_red', 'A_avg']].plot(color=['green', 'red', 'gray'])

在此处输入图像描述

To perform the task 1) you can do this:要执行任务 1),您可以执行以下操作:

 #union of the indexes
 union_idx = green_df.index.union(red_df.index)


 #reindex with the union
 green_df= green_df.reindex(union_idx)
 red_df= red_df.reindex(union_idx)

 # the interpolation
 green_df = green_df.interpolate(method='linear', limit_direction='forward', axis=0)
 red_df = red_df.interpolate(method='linear', limit_direction='forward', axis=0)


 grey_df= pd.concat([green_df,red_df])
 grey_df= grey_df.groupby(level=0).mean()

I get (i didn't pay attention to displaying the correct colors)我明白了(我没有注意显示正确的颜色) 我的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM