简体   繁体   English

根据其他列中的行值计算数据框中行值之间的差异

[英]Calculate difference between row values in dataframe based on row value in other column

How can I calculate the difference between row values for each year, starting the calculation anew when the year changes?如何计算每年行值之间的差异,并在年份更改时重新开始计算?

I have the following dataframe:我有以下数据框:

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]})

The year is set as the index in the data frame so that no difference is calculated between the years.年份被设置为数据框中的索引,因此不会计算年份之间的差异。 df = df.set_index('year')

The result that I would like to get, is the following dataframe:我想得到的结果是以下数据框:

df_result = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [0, 2, 2, 0, 1, 3], 
              'measurement2': [0, 1, 0, 0, 1, 2]})

You can see that the difference is calculated between the rows during each year.您可以看到差异是在每年的行之间计算的。 When we have a measurement for a new year, the calculation starts again from new.当我们测量新的一年时,计算会重新从新开始。 If using the .diff method, the difference is also calculated between the values of the consecutive years.如果使用 .diff 方法,还会计算连续年份值之间的差值。

How can I calculate the difference between the values only measured during one year?如何计算仅在一年内测量的值之间的差异?

Many thanks in advance!提前谢谢了!

使用 pandas groupby按年份分组,然后应用diff()

grouped = df.groupby("year").diff()

As the transformation is not trivial, I would define a function:由于转换不是微不足道的,我会定义一个函数:

def delta(x):
    y = (x.shift().bfill() - x)
    return(np.where(y>=0, y, -y))

Then groupby transform will to the job:然后groupby transform将执行以下操作:

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]}).set_index('year')

df_resul = df.groupby(level=0).transform(delta).astype(int)

it gives:它给:

      measurement1  measurement2
year                            
2010             0             0
2010             2             1

(just use reset_index to get your expected dataframe) (只需使用reset_index来获取您预期的数据帧)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据数据框中的其他行值创建列 - Creating column based on other row value in dataframe 如何计算行的列值与 dataframe 中具有多个值的所有其他行的差异? 迭代每一行 - How to calculate the difference of a row's column values against all other rows with multiple values in a dataframe? Iterate for every row 根据此行另一列与上一行的差确定一个值 - Determine a value based on the difference between another column of this row and the previous row 使用基于(非唯一)列值的其他行中的值替换 DataFrame 行中的 NaN 值 - Replacing NaN values in a DataFrame row with values from other rows based on a (non-unique) column value pandas dataframe根据相应行的其他列更新列值 - pandas dataframe update column values based on other columns of the corresponding row 根据 dataframe 中的其他行值添加新列 - add a new column based on other row values in dataframe 根据列值计算两个熊猫数据框之间的值 - calculate values between two pandas dataframe based on a column value 根据 pandas 中的其他列值合并具有相同列值的行 - Merge row with a same column value based on other column values in pandas Pandas 根据其他行计算列 - Pandas calculate column based on other row 根据该组中列的第一行值更改分组 dataframe 中的值 - Changing values in grouped dataframe based on first row value of the column in that group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM