How can I calculate the difference between row values for each year, starting the calculation anew when the year changes?
I have the following dataframe:
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [1, 3, 5, 2, 3, 6],
'measurement2': [2, 1, 1, 3, 2, 4]})
The year is set as the index in the data frame so that no difference is calculated between the years. df = df.set_index('year')
The result that I would like to get, is the following dataframe:
df_result = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [0, 2, 2, 0, 1, 3],
'measurement2': [0, 1, 0, 0, 1, 2]})
You can see that the difference is calculated between the rows during each year. When we have a measurement for a new year, the calculation starts again from new. If using the .diff method, the difference is also calculated between the values of the consecutive years.
How can I calculate the difference between the values only measured during one year?
Many thanks in advance!
使用 pandas groupby
按年份分组,然后应用diff()
grouped = df.groupby("year").diff()
As the transformation is not trivial, I would define a function:
def delta(x):
y = (x.shift().bfill() - x)
return(np.where(y>=0, y, -y))
Then groupby
transform
will to the job:
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011, 2011, 2011],
'measurement1': [1, 3, 5, 2, 3, 6],
'measurement2': [2, 1, 1, 3, 2, 4]}).set_index('year')
df_resul = df.groupby(level=0).transform(delta).astype(int)
it gives:
measurement1 measurement2
year
2010 0 0
2010 2 1
(just use reset_index
to get your expected dataframe)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.