简体   繁体   中英

Pandas dataframe conditional interpolation

Supposed that I have a data frame that looks like this

import pandas as pd
import numpy as np

na = np.nan

df = pd.DataFrame({
    'location' : ['a','a','a','a','a','b','b','b','b','b'],
    'temp' : [11.6,12.2,na,12.4,12.9,27.9,27.6,na,27.2,26.8],
})

And supposed I want to interpolate missing values only in location a and I would like to use this

df.loc[df['location']=='a'].interpolate(method = 'linear',inplace=True)
print(df)

But it gives me error

/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:10709: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().interpolate(
location  temp
0        a  11.6
1        a  12.2
2        a   NaN
3        a  12.4
4        a  12.9
5        b  27.9
6        b  27.6
7        b   NaN
8        b  27.2
9        b  26.8

Any help or reference would be helpful. Thanks

For performance filter in both sides in mask in helper variable.

Here is problem you cannot use inplace , because creates new filtered dataframe which is a subset of original df . Since you are using inplace=True you are getting the aforementioned warning since it tries to modify the new dataframe inplace, to which you don't keep a reference around (and I suspect that if you'd print df you will see that this line actually had no effect), simialr like here :

m = df['location']=='a'
#linear is default, so omitted
df[m] = df[m].interpolate()

inplace=True isn't good, here you should try:

>>> df.loc[df['location'] == 'a'] = df.interpolate()
>>> df
  location  temp
0        a  11.6
1        a  12.2
2        a  12.3
3        a  12.4
4        a  12.9
5        b  27.9
6        b  27.6
7        b   NaN
8        b  27.2
9        b  26.8
>>> 

Or:

df.loc[df['location'] == 'a'] =  df.loc[df['location'] == 'a'].interpolate()

Removed linear because it's default.

Or try df.mask :

>>> df.mask(df['location'] == 'a', df.interpolate())
  location  temp
0        a  11.6
1        a  12.2
2        a  12.3
3        a  12.4
4        a  12.9
5        b  27.9
6        b  27.6
7        b   NaN
8        b  27.2
9        b  26.8
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM