简体   繁体   中英

Python pandas: how to fill values between existing ones in dataframe column?

I have a pandas DataFrame with 3 columns. The first column contains string values in ascending order, at a certain frequency (eg '20173070000', '20173070020', '20173070040', etc.) . The second and third columns contain corresponding integer values. I would like to re-sample the first column to every one - '20173070000', '20173070001', '20173070002', simultaneously filling the second and third columns with NaN values, and then I would like to interpolate those NaN values.

I've looked into re-sampling data, but this appears to only work for timedate values. I have also looked into pd.interpolate, but this appears to work for interpolating between missing values. As stated above, my dataset does not contain missing data. I am simply looking to increase the frequency of my entries - to fill between existing values.

To give some reference, my current DataFrame looks like this:

         0             1             2
0      20173070000    14.0          13.9
1      20173070020    14.1          14.1
2      20173070040    13.8          13.6
3      20173070060    13.7          13.7
4      20173070080    13.8          13.5
5      20173070100    13.9          14.0

I would like to generate a DataFrame that looks like:

         0             1             2
0      20173070000    14.0          13.9
1      20173070001    NaN            NaN
2      20173070002    NaN            NaN
3      20173070003    NaN            NaN
4      20173070004    NaN            NaN
5      20173070005    NaN            NaN
...
20     20173070020    14.1           14.1
21     20173070021    NaN            NaN
...

I have no problem sorting the interpolation afterwards, but I have not worked out how to up sample yet.

You can just use reindex function. By default, it places NaN in locations having no value in the "new" index.

df = pd.DataFrame({'A': [20173070000, 20173070020, 20173070040, 20173070060, 20173070080, 20173070100 ], 
                  'B': [14, 14.1, 13.8, 13.7, 13.8, 13.9],
                  'C': [13.9, 14.1, 13.6, 13.7, 13.5, 14.0]  })

df.set_index('A').reindex(np.arange(np.min(df.A), np.max(df.A)+1)  ).reset_index()

I believe the interpolate() is the way to go for you. After having upsampled as you described and given the column containing the values you want to interpolate is called 'val1', you can do:

df.loc[:, 'val1'] = df.loc[:, 'val1'].interpolate()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM