[英]How to calculate difference on grouped df?
name date value
a 1/1/2011 3
b 1/1/2011 5
c 1/1/2011 7
a 1/2/2011 6
b 1/2/2011 10
c 1/2/2011 14
I have a df here where the value is cumulative stats. 我在这里有一个df,其中的值是累积统计信息。 So the Actual value of
name: a
date: 1/2/2011
is 3 not 6. To get the actual value of a particular day, I need to take that day's value minus the previous day's value. 因此,
name: a
的实际值name: a
date: 1/2/2011
是3而不是6。要获取特定日期的实际值,我需要将当天的值减去前一天的值。 I want to calculate the actual value of each name for each date. 我想计算每个日期每个名称的实际值。 Something along the lines of
df.groupby(['name', 'date'])['value'].diff()
but this code is returning error. df.groupby(['name', 'date'])['value'].diff()
代码,但是此代码返回错误。
In the end what I need is 最后我需要的是
name date actual value
a 1/1/2011 3
b 1/1/2011 5
c 1/1/2011 7
a 1/2/2011 3
b 1/2/2011 5
c 1/2/2011 7
This can be done in a single line and in a vectorized way. 这可以单行和矢量化的方式完成。
import pandas as pd
df = pd.read_clipboard() # Reading from your question
df['value'] = df.groupby('name')['value'].diff(1).fillna(df['value'])
As was discussed in the comments, it is necessary to reference the original 'values
Series when applying fillna
to correctly replace the NaN
values from diff
(this occurs for the first instance of each label in 'name'
). 如评论中所讨论的,在应用
fillna
来正确替换diff
的NaN
值时,有必要引用原始的'values
系列'values
(这种情况发生在'name'
中每个标签的第一个实例中)。
df['value'] = df['value'].fillna(method='ffill')
df = df.sort_values(by=['name', 'date'])
df['actual'] = df.groupby(['name'])['value'].transform(lambda x: x.diff())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.