简体   繁体   中英

How to Calculate Year over Year Percentage Change in Dataframe with Datetime Index based on Date and not number of Periods

I have multiple Dataframes for macroeconomic timeseries. In each of these Dataframes I want to add a column showing the Year over Year percentage change. Ideally I would do this with a for loop so I don't have to repeat the process multiple times. However, the series do not have the same frequency. For example, GDP is quarterly, PCE is monthly and S&P returns are daily. So, I cannot specify the number of periods. Since my dataframe is already in Datetime index I would like to specify that I want to the percentage change to be calculated based on the dates. Is that possible?

Please see examples of my Dataframes below:

print(gdp):
Date         GDP           
1947-01-01  2.034450e+12
1947-04-01  2.029024e+12
1947-07-01  2.024834e+12
1947-10-01  2.056508e+12
1948-01-01  2.087442e+12
                  ...
2021-04-01  1.936831e+13
2021-07-01  1.947889e+13
2021-10-01  1.980629e+13
2022-01-01  1.972792e+13
2022-04-01  1.969946e+13
[302 rows x 1 columns]

print(pce):
Date        PCE        
1960-01-01  1.695549
1960-02-01  1.706421
1960-03-01  1.692806
1960-04-01  1.863354
1960-05-01  1.911975
              ...
2022-02-01  6.274030
2022-03-01  6.638595
2022-04-01  6.269216
2022-05-01  6.324989
2022-06-01  6.758935
[750 rows x 1 columns]

print(spx):
Date          SPX     
1928-01-03    17.76
1928-01-04    17.72
1928-01-05    17.55
1928-01-06    17.66
1928-01-09    17.59
             ...
2022-08-19  4228.48
2022-08-22  4137.99
2022-08-23  4128.73
2022-08-24  4140.77
2022-08-25  4199.12
[24240 rows x 1 columns]

Instead of doing this:

gdp['GDP] = gdp['GDP'].pct_change(4)
pce['PCE'] = pce['PCE'].pct_change(12)
spx['SPX'] = spx['SPX'].pct_change(252)

I would like a for loop to do it for all Dataframes without specifying the periods but specifying that I want the percentage change from Year to Year.

Given:

d = {'Date': [ '2021-02-01',
               '2021-03-01',
               '2021-04-01',
               '2021-05-01',
               '2021-06-01',
               '2022-02-01',
               '2022-03-01',
               '2022-04-01',
               '2022-05-01',
               '2022-06-01'],
     'PCE': [  1.695549, 1.706421, 1.692806, 1.863354, 1.911975,
               6.274030, 6.638595, 6.269216, 6.324989, 6.758935]}

pce = pd.DataFrame(d)
pce = pce.set_index('Date')
pce.index = pce.to_datetime(pce.index)

You could create a new dataframe with a copy of the datetime index as a new column, resample the new dataframe with annual frequency ('A') and count all unique values in the Date column.

pce_annual_rows = pce.index.to_frame()
resampled_annual = pce_annual_rows.resample('A').count()

Next you can get the second last Date-count value and use that as your periods values in the pct_change method.

The second last, because if there is an incomplete year at the end, you probably end up with a wrong periods value. This assumes, that you have more than 1 year of data in every dataframe, otherwise you'll get an IndexError.

periods_per_year = resampled_annual['Date'].iloc[-2]
pce['ROC'] = pce['PCE'].pct_change(periods_per_year)

This produces the following output:

                 PCE       ROC
Date
2021-02-01  1.695549       NaN
2021-03-01  1.706421       NaN
2021-04-01  1.692806       NaN
2021-05-01  1.863354       NaN
2021-06-01  1.911975       NaN
2022-02-01  6.274030  2.700294
2022-03-01  6.638595  2.890362
2022-04-01  6.269216  2.703446
2022-05-01  6.324989  2.394411
2022-06-01  6.758935  2.535054

This solution isn't very nice, maybe someone comes up with another, less complicated idea.

To build your for-loop to do this for every dataframe, you'd probably better use the same column name for the columns you want to apply the pct_change method on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM