Calculate difference between row values in dataframe based on row value in other column

Question

How can I calculate the difference between row values for each year, starting the calculation anew when the year changes?

I have the following dataframe:

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]})

The year is set as the index in the data frame so that no difference is calculated between the years. df = df.set_index('year')

The result that I would like to get, is the following dataframe:

df_result = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [0, 2, 2, 0, 1, 3], 
              'measurement2': [0, 1, 0, 0, 1, 2]})

You can see that the difference is calculated between the rows during each year. When we have a measurement for a new year, the calculation starts again from new. If using the .diff method, the difference is also calculated between the values of the consecutive years.

How can I calculate the difference between the values only measured during one year?

Many thanks in advance!

Answer 1

使用 pandas groupby按年份分组，然后应用diff()

grouped = df.groupby("year").diff()

Answer 2

As the transformation is not trivial, I would define a function:

def delta(x):
    y = (x.shift().bfill() - x)
    return(np.where(y>=0, y, -y))

Then groupby transform will to the job:

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
               'measurement1': [1, 3, 5, 2, 3, 6], 
              'measurement2': [2, 1, 1, 3, 2, 4]}).set_index('year')

df_resul = df.groupby(level=0).transform(delta).astype(int)

it gives:

      measurement1  measurement2
year                            
2010             0             0
2010             2             1

(just use reset_index to get your expected dataframe)

Calculate difference between row values in dataframe based on row value in other column

Question

2 answers

solution1
2 2020-03-09 16:45:08

solution2
0 2020-03-09 16:45:57

Calculate difference between row values in dataframe based on row value in other column

Question

2 answers

solution1 2 2020-03-09 16:45:08

solution2 0 2020-03-09 16:45:57

solution1
2 2020-03-09 16:45:08

solution2
0 2020-03-09 16:45:57