简体   繁体   中英

Removing time series seasonality with monthly dummies

I am working with weekly google search volume (0-100) data of different Words from 2018-2021. For example my data for the word " gold price" reads as follows:

gold <- (ts(SVI_Log_returns_Winsorized$`gold price`,frequency =52,start = c(2018,1), end = c(2021,52)))

Time Series:
Start = c(2018, 1) 
End = c(2021, 52) 
Frequency = 52 
  [1] -0.10919929  0.10919929 -0.03509132  0.00000000  0.13353139 -0.16989904 -0.16034265  0.04255961 -0.04255961 -0.09097178  0.13353139  0.00000000  0.04082199 -0.04082199  0.00000000 -0.08701138
 [17]  0.00000000 -0.04652002  0.09097178 -0.04445176 -0.04652002  0.00000000  0.00000000  0.04652002  0.04445176 -0.04445176  0.00000000  0.08701138  0.00000000 -0.04255961  0.00000000  0.26570317
 [33] -0.14310084 -0.03922071 -0.12783337  0.08701138 -0.08701138  0.00000000  0.04445176  0.16034265 -0.07696104 -0.04082199 -0.04255961  0.00000000 -0.04445176  0.08701138 -0.08701138  0.08701138
 [49] -0.04255961  0.23180161  0.15906469 -0.15906469 -0.10919929 -0.08004271  0.00000000  0.08004271 -0.12260232  0.00000000  0.08338161 -0.04082199  0.00000000 -0.04255961  0.04255961 -0.04255961
 [65] -0.04445176 -0.04652002  0.04652002 -0.04652002  0.04652002  0.00000000  0.08701138 -0.08701138 -0.04652002  0.25131443 -0.07696104  0.27763174  0.08701138 -0.11778304 -0.06453852  0.03278982
 [81] -0.03278982  0.03278982  0.30228422  0.00000000 -0.15028220  0.02666825 -0.08223810 -0.05884050 -0.06252036  0.00000000  0.00000000 -0.10178269  0.00000000  0.00000000 -0.07410797  0.03774033
 [97] -0.03774033 -0.03922071 -0.04082199  0.04082199  0.03922071  0.03774033  0.10536052  0.15415068  0.25131443 -0.22977835 -0.03390155  0.09844007 -0.06453852 -0.06899287  0.22314355  0.30228422
[113] -0.20875481  0.30228422  0.08252102 -0.22977835 -0.22977835  0.00000000  0.07696104  0.05406722 -0.22977835 -0.07410797 -0.05264373  0.05264373 -0.16705408  0.00000000  0.05884050 -0.05884050
[129]  0.08701138  0.02739897  0.12675171 -0.10008346  0.30228422  0.30228422  0.00000000 -0.13503628 -0.21414799 -0.22977835 -0.06453852 -0.19574458 -0.05556985  0.13353139 -0.10536052  0.00000000
[145]  0.00000000  0.00000000  0.05406722 -0.14107860  0.24116206 -0.10008346  0.07598591 -0.02469261 -0.07796154  0.02666825  0.00000000  0.02597549  0.28768207 -0.14458123 -0.04546237 -0.02353050
[161]  0.30228422 -0.22977835 -0.02469261  0.13976194  0.06317890 -0.08515781 -0.11778304 -0.07796154  0.02666825 -0.05406722 -0.02817088  0.02817088 -0.05715841  0.11122564  0.12361396 -0.04762805
[177] -0.05001042 -0.02597549 -0.05406722  0.05406722  0.00000000 -0.08223810 -0.05884050  0.02985296  0.00000000 -0.02985296 -0.03077166  0.24783616 -0.15822401 -0.05884050 -0.06252036 -0.06669137
[193]  0.12921173  0.05884050  0.00000000 -0.02898754  0.00000000 -0.02985296  0.08701138 -0.02817088  0.10821358 -0.05264373 -0.08455739  0.02898754 -0.05884050  0.05884050 -0.02898754 -0.06062462

Plotting this data looks like this: DiffLog 'Gold price'

As seen here, the data seems to have seasonal components.

Decomposing the data using

decomp <- stl(gold,"periodic")
    
plot(decomp)

gives the following 'Gold Price' decomposed

Looking at the seasonal graph, it seems like the search volume for the word "gold price" drops a lot during the middle of each year.

I'm not quite sure on how to eliminate the seasonality in my data. I've found a couple of papers, which regress such Data on monthly dummies by keeping the residuals. I've tried to replicate this but I'm at loss on where to start. Can somebody advise me on how to approach the problem of seasonality?

Thanks!

我认为《预测——原则与实践》一书是一个很好的起点。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM