简体   繁体   中英

Python pandas dataframe: interpolation using dataframe data without updating it. Just get the interpolated value.

I am fairly new to Python pandas library and cannot find answer to my problem in other posts. I have a dataframe that looks like this. Dates are index names and series are column names.

>>> MyDataframe
             Serie1  Serie2  Serie3  Serie4  Serie5 
2011-04-30      92      96     NaN     NaN     NaN  
2011-05-31     164     168      12     16      NaN
2011-06-30     238     242      90     20      88
2011-07-31     322     326     169     120     167

I would like to perform 1D linear interpolations within this dataframe but without modifying the dataframe, I just want to get the result. For instance I want to determine what is the value of Serie2 at the date of 2011-06-10. The functions DataFrame.interpolate() and Series.interpolate() seem to be useful only to replace the NaN with interpolated data.

Is there a function that could perform something like:

Result = MyDataFrame['Serie2'].interpolate('2011-06-10')

and it would simply return the linear interpolation between 168 and 242.

Thanks in advance for your support!

interpolate interpolates using the existing index, so you have to reindex the df and then call interpolate :

In [48]:
df.reindex(pd.date_range(df.index[0], df.index[-1])).interpolate().loc['2011-06-10']

Out[48]:
Serie1    188.666667
Serie2    192.666667
Serie3     38.000000
Serie4     17.333333
Serie5           NaN
Name: 2011-06-10 00:00:00, dtype: float64

Once this is done you can select a specific date and column:

In [49]:
df.reindex(pd.date_range(df.index[0], df.index[-1])).interpolate().loc['2011-06-10']['Serie2']

Out[49]:
192.66666666666666

Here I generate a new datetimeindex using the first and last values in your index using date_range .

It will be more efficient to just interpolate between the existing index values that are in your range.

We can find the lower bound of the index value using get_slice_bound :

In [70]:
start = df.index.get_slice_bound('2011-06-10', side='right', kind=None)

df.reindex(pd.date_range(df.index[start-1], df.index[start])).interpolate().loc['2011-06-10']['Serie2']
Out[70]:
192.66666666666666

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM