[英]Calculate difference between row values in dataframe based on row value in other column
How can I calculate the difference between row values for each year, starting the calculation anew when the year changes?如何计算每年行值之间的差异,并在年份更改时重新开始计算?
I have the following dataframe:我有以下数据框:
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [1, 3, 5, 2, 3, 6],
'measurement2': [2, 1, 1, 3, 2, 4]})
The year is set as the index in the data frame so that no difference is calculated between the years.年份被设置为数据框中的索引,因此不会计算年份之间的差异。
df = df.set_index('year')
The result that I would like to get, is the following dataframe:我想得到的结果是以下数据框:
df_result = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [0, 2, 2, 0, 1, 3],
'measurement2': [0, 1, 0, 0, 1, 2]})
You can see that the difference is calculated between the rows during each year.您可以看到差异是在每年的行之间计算的。 When we have a measurement for a new year, the calculation starts again from new.
当我们测量新的一年时,计算会重新从新开始。 If using the .diff method, the difference is also calculated between the values of the consecutive years.
如果使用 .diff 方法,还会计算连续年份值之间的差值。
How can I calculate the difference between the values only measured during one year?如何计算仅在一年内测量的值之间的差异?
Many thanks in advance!提前谢谢了!
使用 pandas groupby
按年份分组,然后应用diff()
grouped = df.groupby("year").diff()
As the transformation is not trivial, I would define a function:由于转换不是微不足道的,我会定义一个函数:
def delta(x):
y = (x.shift().bfill() - x)
return(np.where(y>=0, y, -y))
Then groupby
transform
will to the job:然后
groupby
transform
将执行以下操作:
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [1, 3, 5, 2, 3, 6],
'measurement2': [2, 1, 1, 3, 2, 4]}).set_index('year')
df_resul = df.groupby(level=0).transform(delta).astype(int)
it gives:它给:
measurement1 measurement2
year
2010 0 0
2010 2 1
(just use reset_index
to get your expected dataframe) (只需使用
reset_index
来获取您预期的数据帧)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.