简体   繁体   English

Python:如何线性插值月度数据?

[英]Python: how to linearly interpolate monthly data?

I'm fairly new to python, especially the data libraries, so please excuse any idiocy. 我是python的新手,尤其是数据库,所以请原谅。

I'm trying to practise with a made up data set of monthly observations over 12 months, data looks like this... 我正在尝试使用12个月的月度观测数据组成的数据集进行练习,数据看起来像这样...

print(data)

2017-04-17  156
2017-05-09  216
2017-06-11  300
2017-07-29  184
2017-08-31  162
2017-09-24   91
2017-10-15  225
2017-11-03  245
2017-12-26  492
2018-01-26  485
2018-02-18  401
2018-03-09  215
2018-04-30  258

These monthly observations are irregular (there is exactly one in each month but nowhere near the same time). 这些每月的观察是不定期的(每个月确实有一个,但几乎不在同一时间)。

Now, I want to use liner interpolation to get the values at the start of each month - 现在,我想使用线性插值在每个月初获取值-

I've tried a bunch of methods... and was able to do it 'manually', but I'm trying to get to grips with pandas and numpy, and I know it can be done with these, here's what I had so far: I make a Series holding data, and then I do: 我已经尝试了很多方法...并且能够“手动”完成操作,但是我试图与大熊猫和numpy接触,我知道可以用这些方法来完成,这就是我的想法远:我制作了一个包含数据的系列,然后执行:

resampled1 = data.resample('MS')
interp1 = resampled1.interpolate()

print(interp1)

This prints: 打印:

2017-04-01   NaN
2017-05-01   NaN
2017-06-01   NaN
2017-07-01   NaN
2017-08-01   NaN
2017-09-01   NaN
2017-10-01   NaN
2017-11-01   NaN
2017-12-01   NaN
2018-01-01   NaN
2018-02-01   NaN
2018-03-01   NaN
2018-04-01   NaN

Now, I know that the first one 2017-4-17 should be NaN as linear interpolation (which I believe is the default), interpolates between the two points before and after... which is not possible since I don't have a datapoint before April 1st. 现在,我知道第一个2017-4-17应该是NaN线性插值(我相信这是默认值),在前后两点之间插值...这是不可能的,因为我没有4月1日之前的datapoint。 As for the others... I'm not certain what I'm doing wrong... probably just because I'm struggling to wrap my head around exactly what resample is doing? 至于其他...我不确定我做错了什么...可能只是因为我正在努力将我的头完全围绕在重新采样的作用上?

You probably want to resample('D') to interpolate, eg: 您可能想要resample('D')进行插值,例如:

In []:
data.resample('D').interpolate().asfreq('MS')

Out[]:
2017-05-01  194.181818
2017-06-01  274.545455
2017-07-01  251.666667
2017-08-01  182.000000
2017-09-01  159.041667
2017-10-01  135.666667
2017-11-01  242.894737
2017-12-01  375.490566
2018-01-01  490.645161
2018-02-01  463.086957
2018-03-01  293.315789
2018-04-01  234.019231

Try to use RedBlackPy . 尝试使用RedBlackPy

from datetime import datetime
import redblackpy as rb

index = [datetime(2017,4,17), datetime(2017,5,9), datetime(2017,6, 11)]
values = [156, 216, 300]

series = rb.Series(index=index, values=values, interpolate='linear')
# Now you can access by any key with no insertion, using interpolation.
print(series[datetime(2017, 5, 1)]) # prints 194.18182373046875

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM