[英]How to subtract previous row from current row in a pandas dataframe to create a new column restarting the process with each name?
I have a dataframe with 3 columns in which the first column is a categorical variable with a person name, the second column is the date and the third column are the cumulative ocurrences of a problem.我有一个 dataframe 有 3 列,其中第一列是带有人名的分类变量,第二列是日期,第三列是问题的累积发生率。 I would like to generate a new column with the ocurrences by day per person.我想生成一个新列,其中包含每人每天的出现次数。
**Name Date Cumulative**
John 01-01-2020 0
John 02-01-2020 5
John 03-01-2020 10
John 04-01-2020 12
Peter 01-01-2020 0
Peter 02-01-2020 3
Peter 03-01-2020 7
Peter 04-01-2020 10
James 01-01-2020 0
James 02-01-2020 10
James 03-01-2020 14
James 04-01-2020 18
Kirk 01-01-2020 0
Kirk 02-01-2020 12
Kirk 03-01-2020 12
Kirk 04-01-2020 15
Rob 01-01-2020 0
Rob 02-01-2020 11
Rob 03-01-2020 18
Rob 04-01-2020 23
If I use df['By Day'] = df.Cumulative.diff() the result is good but in the first ocurrence of each person it will give me the negative number instead of 0 (because it subtracts the previous number to the 0).如果我使用 df['By Day'] = df.Cumulative.diff() 结果很好,但在每个人的第一次出现时,它会给我负数而不是 0(因为它将前一个数字减去 0 )。 It would give me as follows:它会给我如下:
Name Date Cumulative By Day
John 01-01-2020 0 0
John 01-02-2020 0 0
John 03-01-2020 5 5
John 04-01-2020 10 5
John 05-01-2020 12 2
Peter 01-01-2020 0 -12
Peter 02-01-2020 0 0
Peter 03-01-2020 3 3
Peter 04-01-2020 7 4
Peter 04-01-2020 10 3
James 01-01-2020 0 -10
James 02-01-2020 0 0
James 03-01-2020 10 10
James 04-01-2020 14 4
James 04-01-2020 18 4
Kirk 01-01-2020 0 -18
Kirk 02-01-2020 0 0
Kirk 03-01-2020 12 12
Kirk 04-01-2020 15 3
Kirk 04-01-2020 19 4
Rob 01-01-2020 5 -14
Rob 02-01-2020 11 6
Rob 03-01-2020 18 7
Rob 04-01-2020 23 5
Rob 04-01-2020 27 4
I would like to do the difference by each name so that it starts from 0 every time the person is not the same.我想按每个名字做差异,以便每次人不一样时它都从 0 开始。 I've thought about using an iteration by name but it will do it 5 times for each entry.我曾考虑过按名称使用迭代,但它会为每个条目执行 5 次。 For example I would want, for Rob, 0 6 7 5 4 instead of starting with -14 (the previous 19 from Kirk -5 from Rob's first entry)例如,对于 Rob,我想要 0 6 7 5 4 而不是以 -14 开头(来自 Kirk 的前 19 -5 来自 Rob 的第一个条目)
You should first use groupby
function on the Name
column to apply the diff
function separately over every person.您应该首先在Name
列上使用groupby
function 以分别对每个人应用diff
function。 Then you can use fillna(0)
to replace NaN
values (which will exist in the first row of every person) with 0:然后您可以使用fillna(0)
将NaN
值(将存在于每个人的第一行中)替换为 0:
df["By Day"] = df.groupby("Name").Comulative.diff().fillna(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.