简体   繁体   中英

How to subtract previous row from current row in a pandas dataframe to create a new column restarting the process with each name?

I have a dataframe with 3 columns in which the first column is a categorical variable with a person name, the second column is the date and the third column are the cumulative ocurrences of a problem. I would like to generate a new column with the ocurrences by day per person.

**Name     Date          Cumulative**

John     01-01-2020    0
John     02-01-2020    5
John     03-01-2020    10
John     04-01-2020    12
Peter    01-01-2020    0
Peter    02-01-2020    3
Peter    03-01-2020    7
Peter    04-01-2020    10
James    01-01-2020    0
James    02-01-2020    10
James    03-01-2020    14
James    04-01-2020    18
Kirk     01-01-2020    0
Kirk     02-01-2020    12
Kirk     03-01-2020    12
Kirk     04-01-2020    15
Rob      01-01-2020    0
Rob      02-01-2020    11
Rob      03-01-2020    18
Rob      04-01-2020    23

If I use df['By Day'] = df.Cumulative.diff() the result is good but in the first ocurrence of each person it will give me the negative number instead of 0 (because it subtracts the previous number to the 0). It would give me as follows:

Name     Date          Cumulative  By Day

John     01-01-2020    0           0
John     01-02-2020    0           0
John     03-01-2020    5           5
John     04-01-2020    10          5
John     05-01-2020    12          2
Peter    01-01-2020    0           -12
Peter    02-01-2020    0           0
Peter    03-01-2020    3           3
Peter    04-01-2020    7           4
Peter    04-01-2020    10          3
James    01-01-2020    0           -10
James    02-01-2020    0           0
James    03-01-2020    10          10
James    04-01-2020    14          4
James    04-01-2020    18          4 
Kirk     01-01-2020    0           -18
Kirk     02-01-2020    0           0
Kirk     03-01-2020    12          12
Kirk     04-01-2020    15          3
Kirk     04-01-2020    19          4
Rob      01-01-2020    5           -14
Rob      02-01-2020    11          6
Rob      03-01-2020    18          7
Rob      04-01-2020    23          5
Rob      04-01-2020    27          4

I would like to do the difference by each name so that it starts from 0 every time the person is not the same. I've thought about using an iteration by name but it will do it 5 times for each entry. For example I would want, for Rob, 0 6 7 5 4 instead of starting with -14 (the previous 19 from Kirk -5 from Rob's first entry)

You should first use groupby function on the Name column to apply the diff function separately over every person. Then you can use fillna(0) to replace NaN values (which will exist in the first row of every person) with 0:

df["By Day"] = df.groupby("Name").Comulative.diff().fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM