简体   繁体   中英

Python Pandas subtract value of row from value of previous row

I have the following,

import pandas as pd

data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
        ['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
        ['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
        ['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]

dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])

  NAME   TIMESTAMP  VALUE
0  AAA  2019-01-01     10
1  AAA  2019-01-02     21
2  AAA  2019-02-01     30
3  AAA  2019-02-02     45
4  BBB  2019-01-01     50
5  BBB  2019-01-02     60
6  BBB  2019-02-01     70
7  BBB  2019-02-02     80

I want to generate a new column which lists the difference of the VALUE column for the current row from the previous row.

So the output would look somewhat like this,

  NAME   TIMESTAMP  VALUE  DIFF
0  AAA  2019-01-01     10  
1  AAA  2019-01-02     21  11
2  AAA  2019-02-01     30   9
3  AAA  2019-02-02     45  15
4  BBB  2019-01-01     50
5  BBB  2019-01-02     60  10
6  BBB  2019-02-01     70  10
7  BBB  2019-02-02     80  10

Regards.

You could do:

dfx['DIFF'] = dfx.groupby('NAME')['VALUE'].apply(lambda x: x - x.shift()).fillna(0)
print(dfx)

Output

  NAME   TIMESTAMP  VALUE  diff
0  AAA  2019-01-01     10   0.0
1  AAA  2019-01-02     21  11.0
2  AAA  2019-02-01     30   9.0
3  AAA  2019-02-02     45  15.0
4  BBB  2019-01-01     50   0.0
5  BBB  2019-01-02     60  10.0
6  BBB  2019-02-01     70  10.0
7  BBB  2019-02-02     80  10.0

A simpler solution:

dfx.groupby('NAME').diff()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM