[英]Average of two time series in pandas
My goal is to compute the average of two times series ( red
and green
) stored in pandas DataFrame
s.我的目标是计算存储在 pandas
DataFrame
中的两次系列( red
和green
)的平均值。 However, while both time series have the same columns, they differ in precise time points.但是,虽然两个时间序列具有相同的列,但它们的精确时间点不同。 What I want to implement is a function
average
which computes average time series from the two given series such that if a value is missing for particular time point, it should be interpolated.我想要实现的是一个 function
average
,它计算两个给定序列的平均时间序列,这样如果特定时间点缺少一个值,则应该对其进行插值。 For example:例如:
import pandas as pd
green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])
average_grey_df = pd.DataFrame({'A': [4, 2.7, 3.75, 5.5, 3, 4.5], 'B': [...]}, index= [1, 2, 3, 4, 5, 6])
assert average_grey_df == average(green_df, red_df)
It is obvious when displayed graphically (values shown for column A, but the same should be done with all columns; precise values are just illustrative):以图形方式显示时很明显(为 A 列显示的值,但所有列都应该这样做;精确值只是说明性的):
So far I was not able to find a completely working solution.到目前为止,我还没有找到一个完全可行的解决方案。 I was thinking about dividing it to three steps:
我正在考虑将其分为三个步骤:
(1) extend both time series by time points from the other time series such that missing data are nan
(1)从另一个时间序列的时间点扩展两个时间序列,使得缺失的数据是
nan
A | ... A | ...
------- -------
1 | 4 | 1 | 4 |
2 |nan| 2 |2.5|
red: 3 | 2 | green: 3 |nan|
4 |nan| 4 | 8 |
5 |nan| 5 | 2 |
6 | 5 | 6 | 4 |
(2) fill the missing data by interpolating both dataframes (direct usage of dataframe interpolate method ) (3) finally compute average of these two time series as following: (2) 通过对两个数据帧进行插值来填充缺失的数据(直接使用dataframe 插值方法) (3) 最后计算这两个时间序列的平均值如下:
averages = (green_df.stack() + red_df.stack()) / 2
average_grey_df = averages.unstack()
Additionally, method dropna
can be used to drop created nan
s.此外,方法
dropna
可用于删除创建的nan
。 Moreover, maybe there is a better method I haven't discovered.此外,也许还有更好的方法我还没有发现。
I was not able to figure out how to compute part (1) at all.我根本不知道如何计算第 (1) 部分。 I checked methods like
join
, merge
and concat
with its various examples, but none of them seems to do the job.我用各种例子检查了
join
、 merge
和concat
等方法,但似乎没有一个能完成这项工作。 Any suggestions?有什么建议么? I am also open to other approaches.
我也对其他方法持开放态度。
Thank you谢谢
You can merge the two dfs.您可以合并两个dfs。 From there, you can interpolate the NA values
从那里,您可以插入 NA 值
green_df = pd.DataFrame({'A': [4, 2, 5], 'B': [1, 2, 3]}, index=[1, 3, 6])
red_df = pd.DataFrame({'A': [4, 2.5, 8, 2, 4], 'B': [4, 2, 2, 4, 1]}, index=[1, 2, 4, 5, 6])
combined_df = pd.merge(green_df, red_df, suffixes=('_green', '_red'), left_index=True, right_index=True, how='outer')
combined_df = combined_df.interpolate()
combined_df['A_avg'] = combined_df[["A_green", "A_red"]].mean(axis=1)
combined_df['B_avg'] = combined_df[["B_green", "B_red"]].mean(axis=1)
These can then be plotted using .plot()
:然后可以使用
.plot()
绘制这些:
combined_df[['A_green', 'A_red', 'A_avg']].plot(color=['green', 'red', 'gray'])
To perform the task 1) you can do this:要执行任务 1),您可以执行以下操作:
#union of the indexes
union_idx = green_df.index.union(red_df.index)
#reindex with the union
green_df= green_df.reindex(union_idx)
red_df= red_df.reindex(union_idx)
# the interpolation
green_df = green_df.interpolate(method='linear', limit_direction='forward', axis=0)
red_df = red_df.interpolate(method='linear', limit_direction='forward', axis=0)
grey_df= pd.concat([green_df,red_df])
grey_df= grey_df.groupby(level=0).mean()
I get (i didn't pay attention to displaying the correct colors)我明白了(我没有注意显示正确的颜色)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.