[英]Calculate month over month and year over year change for vintage data
I have a dataframe of economic series whose values can get revised every month, adding a new value for a given date and indexing it by realtime_start
(see below dataframe).我有一个经济系列的 dataframe,其值可以每月修改一次,为给定日期添加一个新值并通过
realtime_start
对其进行索引(参见下面的数据框)。 realtime_start
indicates the date at which value
for date
becomes valid. realtime_start
指示 date value
生效的date
。 This value
expires as soon as another one takes its place.一旦另一个值取代它,这个
value
就会过期。
date![]() |
realtime_start![]() |
value![]() |
---|---|---|
2020-11-01 ![]() |
2020-12-04 ![]() |
142629.0 ![]() |
2020-11-01 ![]() |
2021-01-08 ![]() |
142764.0 ![]() |
2020-11-01 ![]() |
2021-02-05 ![]() |
142809.0 ![]() |
2020-12-01 ![]() |
2021-01-08 ![]() |
142624.0 ![]() |
2020-12-01 ![]() |
2021-02-05 ![]() |
142582.0 ![]() |
2020-12-01 ![]() |
2021-03-05 ![]() |
142503.0 ![]() |
2021-01-01 ![]() |
2021-02-05 ![]() |
142631.0 ![]() |
2021-01-01 ![]() |
2021-03-05 ![]() |
142669.0 ![]() |
2021-01-01 ![]() |
2021-04-02 ![]() |
142736.0 ![]() |
2021-02-01 ![]() |
2021-03-05 ![]() |
143048.0 ![]() |
2021-02-01 ![]() |
2021-04-02 ![]() |
143204.0 ![]() |
2021-03-01 ![]() |
2021-04-02 ![]() |
144120.0 ![]() |
I would like an easy way to calculate the month-over-month change in value
based on the last known entry at date
.我想要一种简单的方法来计算基于
date
的最后一个已知条目的value
的月度变化。
Calculation method: take the first release from month n (based on realtime_start
) and subtract the relevant release from month n-1.计算方法:取第 n 个月的第一个版本(基于
realtime_start
),减去第 n-1 个月的相关版本。 Relevant release is the most recent release whose realtime_start
date does not exceed that of month n.相关版本是
realtime_start
日期不超过第 n 个月的最新版本。
See desired output below请参阅下面的所需 output
date![]() |
MoM change![]() |
---|---|
2020-11-01 ![]() |
NaN![]() |
2020-12-01 ![]() |
-140 ![]() |
2021-01-01 ![]() |
49 ![]() |
2021-02-01 ![]() |
379 ![]() |
2021-03-01 ![]() |
916 ![]() |
For 2021-03-01
, the MoM change value is 144120.0 - 143204.0 = 916.0
对于
2021-03-01
,MoM 变化值为144120.0 - 143204.0 = 916.0
For 2021-02-01
, the MoM change value is 143048.0 - 142669.0 = 379.0
对于
2021-02-01
,MoM 变化值为143048.0 - 142669.0 = 379.0
For 2021-01-01
, the MoM change value is 142631.0 - 142582.0 = 49.0
对于
2021-01-01
,MoM 变化值为142631.0 - 142582.0 = 49.0
Similarly, I would like to calculate the year-over-year change based on the last known values at date
(actual data frame extends further into the past).同样,我想根据
date
的最后一个已知值计算同比变化(实际数据框延伸到过去)。 I would also like to calculate the 3-month (rolling) average of month-over-month change based on last known values at date
.我还想根据
date
的最后一个已知值计算月度变化的 3 个月(滚动)平均值。
df = df.set_index('date')
first = df.groupby(level=0).first()
m = df['realtime_start'].le(first['realtime_start'].shift(-1))
last_val = df['value'].mask(~m).groupby(level=0).last().shift()
mom_change = (first['value'] - last_val).reset_index(name='MoM change')
Set the index
of the dataframe to the column date
then group
the dataframe on level=0
and aggregate using first
to select the first row for each unique date
将 dataframe 的
index
设置为列date
,然后将 dataframe group
到level=0
并使用first
聚合到 select 每个唯一date
的第一行
>>> first
realtime_start value
date
2020-11-01 2020-12-04 142629.0
2020-12-01 2021-01-08 142624.0
2021-01-01 2021-02-05 142631.0
2021-02-01 2021-03-05 143048.0
2021-03-01 2021-04-02 144120.0
Shift the column realtime_start
in the first
dataframe, then compare it with realtime_start
column in df
to create a boolean mask m
将
first
dataframe 中的realtime_start
列移动,然后将其与df
中的realtime_start
列进行比较以创建 boolean 掩码m
>>> m
date
2020-11-01 True
2020-11-01 True
2020-11-01 False
2020-12-01 True
2020-12-01 True
2020-12-01 False
2021-01-01 True
2021-01-01 True
2021-01-01 False
2021-02-01 True
2021-02-01 True
2021-03-01 False
Name: realtime_start, dtype: bool
Now mask the values in the value
column using the above boolean mask then group this masked column on level=0
and aggregate using last to select last row for each unique id现在使用上面的 boolean 掩码屏蔽
value
列中的值,然后将此屏蔽列分组到level=0
并使用 last 聚合到 select 每个唯一 ID 的最后一行
>>> last
date
2020-11-01 NaN
2020-12-01 142764.0
2021-01-01 142582.0
2021-02-01 142669.0
2021-03-01 143204.0
Name: value, dtype: float64
Subtract the value
column in first
dataframe from the calculated last_val
column to calculate the MoM change
从计算的
last_val
列中减去first
dataframe 中的value
列以计算MoM change
>>> mom_change
date MoM change
0 2020-11-01 NaN
1 2020-12-01 -140.0
2 2021-01-01 49.0
3 2021-02-01 379.0
4 2021-03-01 916.0
PS: The dataframe must be sorted on date
column in order for this solution to work properly PS:dataframe 必须按
date
列排序才能使此解决方案正常工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.