根据其他列中的行值计算数据框中行值之间的差异

Question

How can I calculate the difference between row values for each year, starting the calculation anew when the year changes?如何计算每年行值之间的差异，并在年份更改时重新开始计算？

I have the following dataframe:我有以下数据框：

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]})

The year is set as the index in the data frame so that no difference is calculated between the years.年份被设置为数据框中的索引，因此不会计算年份之间的差异。 df = df.set_index('year')

The result that I would like to get, is the following dataframe:我想得到的结果是以下数据框：

df_result = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [0, 2, 2, 0, 1, 3], 
              'measurement2': [0, 1, 0, 0, 1, 2]})

You can see that the difference is calculated between the rows during each year.您可以看到差异是在每年的行之间计算的。 When we have a measurement for a new year, the calculation starts again from new.当我们测量新的一年时，计算会重新从新开始。 If using the .diff method, the difference is also calculated between the values of the consecutive years.如果使用 .diff 方法，还会计算连续年份值之间的差值。

How can I calculate the difference between the values only measured during one year?如何计算仅在一年内测量的值之间的差异？

Many thanks in advance!提前谢谢了！

Answer 1

使用 pandas groupby按年份分组，然后应用diff()

grouped = df.groupby("year").diff()

Answer 2

As the transformation is not trivial, I would define a function:由于转换不是微不足道的，我会定义一个函数：

def delta(x):
    y = (x.shift().bfill() - x)
    return(np.where(y>=0, y, -y))

Then groupby transform will to the job:然后groupby transform将执行以下操作：

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]}).set_index('year')

df_resul = df.groupby(level=0).transform(delta).astype(int)

it gives:它给：

      measurement1  measurement2
year                            
2010             0             0
2010             2             1

(just use reset_index to get your expected dataframe) （只需使用reset_index来获取您预期的数据帧）

根据其他列中的行值计算数据框中行值之间的差异

问题描述

2 个解决方案

解决方案1
2 2020-03-09 16:45:08

解决方案2
0 2020-03-09 16:45:57

根据其他列中的行值计算数据框中行值之间的差异

问题描述

2 个解决方案

解决方案1 2 2020-03-09 16:45:08

解决方案2 0 2020-03-09 16:45:57

解决方案1
2 2020-03-09 16:45:08

解决方案2
0 2020-03-09 16:45:57