[英]Python linear interpolation of values in dataframe
I have a python dataframe with hourly values for Jan 2015 except some hours are missing the index and values both. 我有一个Python数据框,其中包含2015年1月的小时值,但有些小时缺少索引和值。 Ideally the dataframe with columns named "dates" and "values" should have 744 rows in it.
理想情况下,具有名为“日期”和“值”的列的数据框应具有744行。 However, it has randomly missing 10 hours and hence has only 734 rows.
但是,它随机丢失了10个小时,因此只有734行。 I want to interpolate for missing hours in the month to create the desired dataframe with 744 "dates" and 744 "values".
我想对月份中缺少的小时进行插值,以创建具有744个“日期”和744个“值”的所需数据框。
Edit: 编辑:
I am new to python so I am struggling with implementing this idea: 我是python的新手,所以我正在努力实现这个想法:
Edit2: 编辑2:
I was looking for hint for code snippets. 我正在寻找代码片段的提示。 Based on suggestion below I was able to create the following code but it fails to fill in the values which are zeros at the start of the month ie for hours 1 through 5 on Jan 1.
根据以下建议,我能够创建以下代码,但未能在月初(即1月1日的第1到5个小时)填写零值。
import panda as pd
st_dt = '2015-01-01'
en_dt = '2015-01-31'
DateTimeHour = pd.date_range( pd.Timestamp( st_dt ).date(), pd.Timestamp(
en_dt ).date(), freq='H')
Pwr.index = pd.DatetimeIndex(Pwr.index) #Pwr is the original dataframe
Pwr = Pwr.reindex( DateTimeHour, fill_value = 0 )
Pwr2 = pd.Series( Pwr.values )
Pwr2.interpolate( imit_direction='both' )
What you want requires a combination of this technique: Add missing dates to pandas dataframe 您想要什么需要此技术的组合: 将缺失的日期添加到熊猫数据框
And the pandas function pandas.Series.interpolate
. 熊猫函数
pandas.Series.interpolate
。 From what you've said, the option 'linear' is what you want. 从您所说的来看,“线性”选项就是您想要的。
EDIT: 编辑:
Interpolate will not work in the case were you have datapoints missing at the very start of the time series. 如果您在时间序列的开始就缺少数据点,则无法进行插值。 One idea is to use pandas.Series.fillna with 'backfill' after the interpolation.
一种想法是在插值后将pandas.Series.fillna与'backfill'一起使用。 Also, do not set fill_value to 0 whe you call reindex
另外,调用reindex时,请勿将fill_value设置为0
Use df.asfreq
to expand the DataFrame so as to have an hourly frequency. 使用
df.asfreq
扩展DataFrame,使其具有每小时频率。 NaN is inserted for missing values: 插入NaN以获取缺失值:
df = df.asfreq('H')
then use df.interpolate
to replace the NaNs with (linearly) interpolated values based on the DatetimeIndex and the nearest non-NaN values: 然后使用
df.interpolate
根据日期时间df.interpolate
和最接近的非NaN值将NaN替换为(线性)内插值:
df = df.interpolate(method='time')
For example, 例如,
import numpy as np
import pandas as pd
N, M = 744, 734
index = pd.date_range('2015-01-01', periods=N, freq='H')
idx = np.random.choice(np.arange(N), M, replace=False)
idx.sort()
index = index[idx]
# This creates a toy DataFrame with 734 non-null rows:
df = pd.DataFrame({'values': np.random.randint(10, size=(M,))}, index=index)
# This expands the DataFrame to 744 rows (10 null rows):
df = df.asfreq('H')
# This makes `df` have 744 non-null rows:
df = df.interpolate(method='time')
A general interpolation is the following: 常规插值如下:
If the key exits: 如果密钥退出:
else: 其他:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.