简体   繁体   English

bsts - 测试时间序列的趋势

[英]bsts - test for trend in time series

I am using the bsts package to analyze several time series, to find out whether the values in the series are increasing, decreasing or remaining stable along the time period.我正在使用 bsts package 来分析几个时间序列,以确定序列中的值是在增加、减少还是在一段时间内保持稳定。

I am still learning bayesian structural models, so forgive me for any imprecision.我仍在学习贝叶斯结构模型,所以请原谅我的任何不精确之处。 I see I can extract a trend component from the model, which looks like it is increasing (code below).我看到我可以从 model 中提取趋势分量,它看起来正在增加(下面的代码)。 But how can I formally test the direction of the trend?但是我怎样才能正式测试趋势的方向呢? Should I remove the local linear trend and instead do a regression against time?我应该删除局部线性趋势,而是对时间进行回归吗? Or is there a way to answer that question using the trend component already extracted?或者有没有办法使用已经提取的趋势成分来回答这个问题?

Here is one of the time series:这是时间序列之一:

          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug      Sep      Oct      Nov      Dec
2013 5.454450 5.351702 5.450414 5.490382 5.367216 5.404927 5.267974 5.211403 5.439326 5.394489 5.425333 5.413367
2014 5.365040 5.359957 5.388980 5.272279 5.337346 5.383114 5.211803 5.567671 5.446918 5.427461 5.490386 5.429216
2015 5.522443 5.432766 5.581413 5.307201 5.452850 5.426128 5.372678 5.524919 5.400167 5.453443 5.596288 5.424220
2016 5.633506 5.553917 5.553649 5.444297 5.551022 5.461216 5.503183 5.426033 5.454331 5.643385 5.575780 5.486073
2017 5.656180 5.411885 5.477615 5.437005 5.434144 5.404344 5.481005 5.437255 5.352800 5.485022 5.441534 5.472936
2018 5.366347 5.381583 5.401431 5.479976 5.439315 5.319484 5.421672 5.319448 5.574673 5.472161 5.479900 5.539380
2019 5.390139 5.429426 5.613977 5.343529 5.487940 5.624361 5.381285 5.366611 5.565688 5.503865 5.491821 5.486168
2020 5.475343 5.493866 6.010556 5.690947 5.656557 5.420500 5.453484 5.566972 5.369799 5.435967 5.613358 5.409345

And the code I used to extract the trend (from https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/ ), y is the time series above:而我用来提取趋势的代码(来自https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/ ),y就是上面的时间序列:

ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 12)
bsts.model <- bsts(y, state.specification = ss, niter = 500, ping=0, seed=2016)

burn <- SuggestBurn(0.1, bsts.model)

components <- cbind.data.frame(
  colMeans(bsts.model$state.contributions[-(1:burn),"trend",]),                               
  colMeans(bsts.model$state.contributions[-(1:burn),"seasonal.12.1",]),
  as.Date(time(Y)))  

names(components) <- c("Trend", "Seasonality", "Date")
components <- melt(components, id="Date")
names(components) <- c("Date", "Component", "Value")

ggplot(data=components, aes(x=Date, y=Value)) + geom_line() + 
  theme_bw() + theme(legend.title = element_blank()) + ylab("") + xlab("") + 
  facet_grid(Component ~ ., scales="free") + guides(colour=FALSE) + 
  theme(axis.text.x=element_text(angle = -90, hjust = 0))

This can be done by inspecting the draws more closely.这可以通过更仔细地检查绘图来完成。

Let's take this apart:让我们把这个分开:

colMeans(bsts.model$state.contributions[-(1:burn), "trend",]) 

bsts.model$state.contributions is an array of shape (iter, metric, obs) . bsts.model$state.contributions是一个形状数组(iter, metric, obs)
bsts.model$state.contributions[-(1:burn), "trend", ] Turns this into a (non_burnin_iter, obs) matrix. bsts.model$state.contributions[-(1:burn), "trend", ]将其转换为(non_burnin_iter, obs)矩阵。

The call to colMeans gets the average trend but you can extract any quantile or shares of properties as well:colMeans的调用获取平均趋势,但您也可以提取任何分位数或属性份额:

m <- bsts.model$state.contributions[-(1:burn), "trend", ]

# Average trend, what you are getting now.
colMeans(m)  

# Posterior probability that the effect of the local trend is positive
# Probably what you want though note that it's conceptually not the
# same as a p-value. It's more what people think 1-pvalue is.
colMeans(m > 0)  

# Lower bound of the 95% credible interval of the trend.
# Useful if you have a region or practical equivalence (ROPE).
apply(m, 2, quantile, p = 0.025) 

I thought of a solution (code below), please let me know if it makes sense.我想到了一个解决方案(下面的代码),如果它有意义请告诉我。

Basically, for every point in the time series, I:基本上,对于时间序列中的每个点,我:

  1. check if the lower limit of the 95% confidence interval is bigger than the upper limit of any of the points before it.检查 95% 置信区间的下限是否大于其之前任何点的上限。

  2. check if the upper limit of the 95% confidence interval is smaller than the lower limit of any of the points before it.检查 95% 置信区间的上限是否小于其之前任何点的下限。

This way, if the time series is stable, both 1 and 2 should be small.这样,如果时间序列稳定,1 和 2 都应该很小。 If it is increasing, 1 should be bigger than 2. If it is decreasing, 2 should be bigger than 1. If it is fluctuating, both 1 and 2 should be big.如果是增加的,1应该大于2。如果是减少的,2应该大于1。如果是波动的,1和2应该都大。

Let's say, if the difference between 1 and 2 is less than 10% of the time series length, I consider it stable.比方说,如果 1 和 2 之间的差异小于时间序列长度的 10%,我认为它是稳定的。

Any reason this could give me misleading results?这有什么理由会给我误导性的结果吗?

components <- cbind.data.frame(
          apply(bsts.model$state.contributions[-(1:burn),"trend",], 2, quantile, p = 0.025),
          apply(bsts.model$state.contributions[-(1:burn),"trend",], 2, quantile, p = 0.925))  
        names(components) <- c("Trend_lci", "Trend_uci")
        
        test_increasing <-
          sapply(1:nrow(components), function(i){ 
            
            any(components$Trend_lci[i] > components$Trend_uci[1:i])
            
          }) %>% sum
        
        test_decreasing <-
          sapply(1:nrow(components), function(i){ 
            
            any(components$Trend_uci[i] < components$Trend_lci[1:i])
            
          }) %>% sum
        
        test_increasing - test_decreasing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM