如何从 pandas dataframe 中的当前行中减去前一行以创建一个新列，以每个名称重新启动进程？

Question

I have a dataframe with 3 columns in which the first column is a categorical variable with a person name, the second column is the date and the third column are the cumulative ocurrences of a problem.我有一个 dataframe 有 3 列，其中第一列是带有人名的分类变量，第二列是日期，第三列是问题的累积发生率。 I would like to generate a new column with the ocurrences by day per person.我想生成一个新列，其中包含每人每天的出现次数。

**Name     Date          Cumulative**

John     01-01-2020    0
John     02-01-2020    5
John     03-01-2020    10
John     04-01-2020    12
Peter    01-01-2020    0
Peter    02-01-2020    3
Peter    03-01-2020    7
Peter    04-01-2020    10
James    01-01-2020    0
James    02-01-2020    10
James    03-01-2020    14
James    04-01-2020    18
Kirk     01-01-2020    0
Kirk     02-01-2020    12
Kirk     03-01-2020    12
Kirk     04-01-2020    15
Rob      01-01-2020    0
Rob      02-01-2020    11
Rob      03-01-2020    18
Rob      04-01-2020    23

If I use df['By Day'] = df.Cumulative.diff() the result is good but in the first ocurrence of each person it will give me the negative number instead of 0 (because it subtracts the previous number to the 0).如果我使用 df['By Day'] = df.Cumulative.diff() 结果很好，但在每个人的第一次出现时，它会给我负数而不是 0（因为它将前一个数字减去 0 ）。 It would give me as follows:它会给我如下：

Name     Date          Cumulative  By Day

John     01-01-2020    0           0
John     01-02-2020    0           0
John     03-01-2020    5           5
John     04-01-2020    10          5
John     05-01-2020    12          2
Peter    01-01-2020    0           -12
Peter    02-01-2020    0           0
Peter    03-01-2020    3           3
Peter    04-01-2020    7           4
Peter    04-01-2020    10          3
James    01-01-2020    0           -10
James    02-01-2020    0           0
James    03-01-2020    10          10
James    04-01-2020    14          4
James    04-01-2020    18          4 
Kirk     01-01-2020    0           -18
Kirk     02-01-2020    0           0
Kirk     03-01-2020    12          12
Kirk     04-01-2020    15          3
Kirk     04-01-2020    19          4
Rob      01-01-2020    5           -14
Rob      02-01-2020    11          6
Rob      03-01-2020    18          7
Rob      04-01-2020    23          5
Rob      04-01-2020    27          4

I would like to do the difference by each name so that it starts from 0 every time the person is not the same.我想按每个名字做差异，以便每次人不一样时它都从 0 开始。 I've thought about using an iteration by name but it will do it 5 times for each entry.我曾考虑过按名称使用迭代，但它会为每个条目执行 5 次。 For example I would want, for Rob, 0 6 7 5 4 instead of starting with -14 (the previous 19 from Kirk -5 from Rob's first entry)例如，对于 Rob，我想要 0 6 7 5 4 而不是以 -14 开头（来自 Kirk 的前 19 -5 来自 Rob 的第一个条目）

Answer 1

You should first use groupby function on the Name column to apply the diff function separately over every person.您应该首先在Name列上使用groupby function 以分别对每个人应用diff function。 Then you can use fillna(0) to replace NaN values (which will exist in the first row of every person) with 0:然后您可以使用fillna(0)将NaN值（将存在于每个人的第一行中）替换为 0：

df["By Day"] = df.groupby("Name").Comulative.diff().fillna(0)

如何从 pandas dataframe 中的当前行中减去前一行以创建一个新列，以每个名称重新启动进程？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-11 15:30:18

如何从 pandas dataframe 中的当前行中减去前一行以创建一个新列，以每个名称重新启动进程？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-11 15:30:18

解决方案1
1 已采纳 2020-07-11 15:30:18