简体   繁体   中英

In pandas: Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row

I found it really hard to articulate this problem, but it is a similar problem to this one .

For example, let's say I have a table like this:

Label Value
A 1
B 2
C Nan
D Nan
E Nan
F 12

It needs to look like this:

Label Value
A 1
B 2
C 3
D 3
E 3
F 3

Used the next available value (12) and divided it by the number of Nan values + 1 (12/4) = 3 and replaced the Nan values AND the original value used to interpolate(12) by 3. It is similar to the previous question but also modifies the original value used for interpolation.

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

test['break'] = np.where(test['Value'].notnull(),1,0)
test['group'] = test['break'].shift().fillna(0).cumsum()
test['Value2'] = test.groupby('group').Value.apply(lambda x: x.fillna( x.max() / len(x)))

for row in range(0,test.shape[0]):
    if test['break'].iloc[row] == 0 and test['break'].iloc[row+1] == 1:
        test.at[row+1, 'Value2'] = test['Value2'].iloc[row]

df.interpolate() is not capable of doing this, and this is what I have so far. It gets the job done, but is not very elegant

Perhaps something like this?

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

tr = test.assign(
    g=(~test['Value'].isna())[::-1].cumsum()
).groupby('g')['Value'].transform

df = test.assign(Value=tr('last') / tr('size'))

>>> df
  Label     Value
0     A  1.000000
1     B  2.000000
2     C  3.000000
3     D  3.000000
4     E  3.000000
5     F  3.000000
6     G  1.333333
7     H  1.333333
8     I  1.333333

Explanation

With the .assign(g=...) we make groups of values that end with a non-NaN values and preceded by zero or more NaN:

>>> test.assign(
...     g=(~test['Value'].isna())[::-1].cumsum()
... )

  Label  Value  g
0     A    1.0  4
1     B    2.0  3
2     C    NaN  2
3     D    NaN  2
4     E    NaN  2
5     F   12.0  2
6     G    NaN  1
7     H    NaN  1
8     I    4.0  1

Then, we use .groupby('g')['Values'].transform twice: to get the last() value, and to divide it by the size() of the group.

In the similar vain as the very good answer by Pierre D , you could also use the following solution for the purpose of grouping in one go:

test['Value'] = ((test.assign(id = (~test['Value'].isna()).cumsum().mask(test['Value'].isna())
                              .bfill()))
 .groupby('id').transform(lambda x: x.bfill() / len(x)))

  Label     Value
0     A  1.000000
1     B  2.000000
2     C  3.000000
3     D  3.000000
4     E  3.000000
5     F  3.000000
6     G  1.333333
7     H  1.333333
8     I  1.333333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM