简体   繁体   中英

Python Pandas: How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another

I am trying to populate the values in a new column in a Pandas df by subtracting the value of two non-consecutive rows in a different column within the same df. I can do it, so long as the df does not have a column with dates in it. But if it does have a column with dates then pandas throws an error.

Assume the following dataframe.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 55, 9], [10, 99, 19], [27, 38, 29], [39, 10, 72]]),
                   columns=['a', 'b', 'c'])
df['Date'] = ['2020-01-02', '2020-01-05', '2020-06-10', '2020-08-05', '2020-09-01', '2020-10-29']
df['Date'] = pd.to_datetime(df['Date'])

df['d'] = ''
df = df[['Date', 'a', 'b', 'c', 'd']]

This gives me a df that looks like this:

    Date        a   b   c   d
0   2020-01-02  1   2   3   
1   2020-01-05  4   5   6   
2   2020-06-10  7   55  9   
3   2020-08-05  10  99  19  
4   2020-09-01  27  38  29  
5   2020-10-29  39  10  72  

I am trying to create a new column 'd' that, for each row, subtracts the value in column 'b' two rows below from the row in question. For instance, the value in row [0], column ['d'] would be calculated as df.loc[2]['b'] - df.loc[0]['b'].

What I'm trying (which doesn't work) is:

for i in range(len(df)-2):
    df.loc[i]['d'] = df.loc[i+2]['b'] - df.loc[i]['b']

I can get this to work if I have no date in the df. But when I add a column with dates, it throws an error message saying

A value is trying to be set on a copy of a slice from a DataFrame

I can't figure out why a date column causes the df to be unable to do math on columns with only int64 data. I've tried searching this site and just can't seem to solve the problem. Any help would be greatly appreciated.

You can do it in vectorized form using shift (which is considerably faster than using loops):

df['d'] = df['b'].shift(-2) - df['b']
df

Output:

        Date   a   b   c     d
0 2020-01-02   1   2   3  53.0
1 2020-01-05   4   5   6  94.0
2 2020-06-10   7  55   9 -17.0
3 2020-08-05  10  99  19 -89.0
4 2020-09-01  27  38  29   NaN
5 2020-10-29  39  10  72   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM