简体   繁体   中英

Group 1's and 0's into two phrases, identify start and end, and count duration

I am doing some cyclical analysis.

I have Variable X, which if true if in the state of contraction, and false otherwise

X
##[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

....

which I changed into 0's and 1's by

X2<-as.ts(X*1)

Then I have a date sequence.

td
## [1] "2000-01-31" "2000-02-29" "2000-03-31" "2000-04-30" "2000-05-31" "2000-06-30"

....

which i then used 'zoo' to index X2 with order td.

library(zoo)
na_ts = zoo(x=X2, order.by=td) 

Now is my question. I would want to identify the dates when the value changes, and count how long the series has stayed as 1 and 0.

So desired outcome:

start      end          type       duration
2000-01-31 - 2001-05-31 contraction 17 months
2001-06-30 - 2004-05-31  expansion .... 

Would anybody help me please? Many thanks in advance.

You can use the run-length encoding of X to split up the time series into consecutive elements with the same value:

# Reproducible example
X <- c(F, F, F, T, T, F)
td <- c( "2000-01-31", "2000-02-29", "2000-03-31", "2000-04-30", "2000-05-31", "2000-06-30")
library(zoo)
na_ts = zoo(x=X, order.by=td)

# Split with run-length encoding
runlens <- rle(X)
(ts.spl <- split(na_ts, rep(seq_along(runlens$lengths), times=runlens$lengths)))
# $`1`
# 2000-01-31 2000-02-29 2000-03-31 
#      FALSE      FALSE      FALSE 
# 
# $`2`
# 2000-04-30 2000-05-31 
#       TRUE       TRUE 
# 
# $`3`
# 2000-06-30 
#      FALSE 

Now you can extract whatever information you want from each time series stored in the list ts.spl . For instance:

dat <- data.frame(start = sapply(ts.spl, start),
                  end = sapply(ts.spl, end),
                  val = ifelse(runlens$values, "contraction", "expansion"))
dat$days <- as.numeric(as.Date(dat$end) - as.Date(dat$start), units="days")
dat
#        start        end         val days
# 1 2000-01-31 2000-03-31   expansion   60
# 2 2000-04-30 2000-05-31 contraction   31
# 3 2000-06-30 2000-06-30   expansion    0

This approach is an example of split-apply-combine, where we split our original data based on some property of the data, applied a function to extract information of interest about each piece, and then combined it back together.

Here is the code after my slight modification. Thanks josilber! We usually work on monthly data in cyclical analysis, because dating up to days wouldn't be accurate. Also the economy can either be in recession/expansion, so there wouldn't be a zero.

na_ts = zoo(x=X, order.by=td)

# Split with run-length encoding

runlens <- rle(X)
(ts.spl <- split(na_ts, rep(seq_along(runlens$lengths), times=runlens$lengths)))

dat <- data.frame(start = sapply(ts.spl, start),
                  end = sapply(ts.spl, end),
                  val = ifelse(runlens$values, "contraction", "expansion"))
dat$months<- runlens$lengths
dat

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM