pandas 中两个时间序列的平均值

Question

Ultimate goal description终极目标描述

My goal is to compute the average of two times series ( red and green ) stored in pandas DataFrame s.我的目标是计算存储在 pandas DataFrame中的两次系列（ red和green ）的平均值。 However, while both time series have the same columns, they differ in precise time points.但是，虽然两个时间序列具有相同的列，但它们的精确时间点不同。 What I want to implement is a function average which computes average time series from the two given series such that if a value is missing for particular time point, it should be interpolated.我想要实现的是一个 function average ，它计算两个给定序列的平均时间序列，这样如果特定时间点缺少一个值，则应该对其进行插值。 For example:例如：

import pandas as pd
green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])

average_grey_df = pd.DataFrame({'A': [4, 2.7, 3.75, 5.5, 3, 4.5], 'B': [...]}, index= [1, 2, 3, 4, 5, 6])

assert average_grey_df == average(green_df, red_df)

It is obvious when displayed graphically (values shown for column A, but the same should be done with all columns; precise values are just illustrative):以图形方式显示时很明显（为 A 列显示的值，但所有列都应该这样做；精确值只是说明性的）：

Approach方法

So far I was not able to find a completely working solution.到目前为止，我还没有找到一个完全可行的解决方案。 I was thinking about dividing it to three steps:我正在考虑将其分为三个步骤：

(1) extend both time series by time points from the other time series such that missing data are nan （1）从另一个时间序列的时间点扩展两个时间序列，使得缺失的数据是nan

                    A  | ...                    A | ...
                -------                     -------
                1 | 4 |                     1 | 4 |
                2 |nan|                     2 |2.5|
    red:        3 | 2 |         green:      3 |nan|
                4 |nan|                     4 | 8 |
                5 |nan|                     5 | 2 |
                6 | 5 |                     6 | 4 |

(2) fill the missing data by interpolating both dataframes (direct usage of dataframe interpolate method ) (3) finally compute average of these two time series as following: (2) 通过对两个数据帧进行插值来填充缺失的数据（直接使用dataframe 插值方法） (3) 最后计算这两个时间序列的平均值如下：

averages = (green_df.stack() + red_df.stack()) / 2
average_grey_df = averages.unstack()

Additionally, method dropna can be used to drop created nan s.此外，方法dropna可用于删除创建的nan 。 Moreover, maybe there is a better method I haven't discovered.此外，也许还有更好的方法我还没有发现。

Question问题

I was not able to figure out how to compute part (1) at all.我根本不知道如何计算第 (1) 部分。 I checked methods like join , merge and concat with its various examples, but none of them seems to do the job.我用各种例子检查了join 、 merge和concat等方法，但似乎没有一个能完成这项工作。 Any suggestions?有什么建议么？ I am also open to other approaches.我也对其他方法持开放态度。

Thank you谢谢

Answer 1

You can merge the two dfs.您可以合并两个dfs。 From there, you can interpolate the NA values从那里，您可以插入 NA 值

green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])

combined_df = pd.merge(green_df, red_df, suffixes=('_green', '_red'), left_index=True, right_index=True, how='outer')
combined_df = combined_df.interpolate()
combined_df['A_avg'] = combined_df[["A_green", "A_red"]].mean(axis=1)
combined_df['B_avg'] = combined_df[["B_green", "B_red"]].mean(axis=1)

These can then be plotted using .plot() :然后可以使用.plot()绘制这些：

combined_df[['A_green', 'A_red', 'A_avg']].plot(color=['green', 'red', 'gray'])

Answer 2

To perform the task 1) you can do this:要执行任务 1)，您可以执行以下操作：

 #union of the indexes
 union_idx = green_df.index.union(red_df.index)


 #reindex with the union
 green_df= green_df.reindex(union_idx)
 red_df= red_df.reindex(union_idx)

 # the interpolation
 green_df = green_df.interpolate(method='linear', limit_direction='forward', axis=0)
 red_df = red_df.interpolate(method='linear', limit_direction='forward', axis=0)


 grey_df= pd.concat([green_df,red_df])
 grey_df= grey_df.groupby(level=0).mean()

I get (i didn't pay attention to displaying the correct colors)我明白了（我没有注意显示正确的颜色）

pandas 中两个时间序列的平均值

问题描述

Ultimate goal description终极目标描述

Approach方法

Question问题

2 个解决方案

解决方案1
1 2020-07-02 16:53:12

解决方案2
1 已采纳 2020-07-02 17:06:26

pandas 中两个时间序列的平均值

问题描述

Ultimate goal description终极目标描述

Approach方法

Question问题

2 个解决方案

解决方案1 1 2020-07-02 16:53:12

解决方案2 1 已采纳 2020-07-02 17:06:26

解决方案1
1 2020-07-02 16:53:12

解决方案2
1 已采纳 2020-07-02 17:06:26