简体   繁体   English

将1和0组为两个词组,标识开始和结束并计算持续时间

[英]Group 1's and 0's into two phrases, identify start and end, and count duration

I am doing some cyclical analysis. 我正在做一些周期性分析。

I have Variable X, which if true if in the state of contraction, and false otherwise 我有变量X,如果处于收缩状态,则为true,否则为false

X
##[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

.... ....

which I changed into 0's and 1's by 我改成0和1

X2<-as.ts(X*1)

Then I have a date sequence. 然后我有一个日期序列。

td
## [1] "2000-01-31" "2000-02-29" "2000-03-31" "2000-04-30" "2000-05-31" "2000-06-30"

.... ....

which i then used 'zoo' to index X2 with order td. 然后我用'zoo'以td顺序索引X2

library(zoo)
na_ts = zoo(x=X2, order.by=td) 

Now is my question. 现在是我的问题。 I would want to identify the dates when the value changes, and count how long the series has stayed as 1 and 0. 我想确定值更改时的日期,并计算该系列停留在1和0的时间。

So desired outcome: 如此理想的结果:

start      end          type       duration
2000-01-31 - 2001-05-31 contraction 17 months
2001-06-30 - 2004-05-31  expansion .... 

Would anybody help me please? 有人可以帮我吗? Many thanks in advance. 提前谢谢了。

You can use the run-length encoding of X to split up the time series into consecutive elements with the same value: 您可以使用X的游程编码将时间序列分成具有相同值的连续元素:

# Reproducible example
X <- c(F, F, F, T, T, F)
td <- c( "2000-01-31", "2000-02-29", "2000-03-31", "2000-04-30", "2000-05-31", "2000-06-30")
library(zoo)
na_ts = zoo(x=X, order.by=td)

# Split with run-length encoding
runlens <- rle(X)
(ts.spl <- split(na_ts, rep(seq_along(runlens$lengths), times=runlens$lengths)))
# $`1`
# 2000-01-31 2000-02-29 2000-03-31 
#      FALSE      FALSE      FALSE 
# 
# $`2`
# 2000-04-30 2000-05-31 
#       TRUE       TRUE 
# 
# $`3`
# 2000-06-30 
#      FALSE 

Now you can extract whatever information you want from each time series stored in the list ts.spl . 现在,您可以从存储在ts.spl列表中的每个时间序列中提取所需的任何信息。 For instance: 例如:

dat <- data.frame(start = sapply(ts.spl, start),
                  end = sapply(ts.spl, end),
                  val = ifelse(runlens$values, "contraction", "expansion"))
dat$days <- as.numeric(as.Date(dat$end) - as.Date(dat$start), units="days")
dat
#        start        end         val days
# 1 2000-01-31 2000-03-31   expansion   60
# 2 2000-04-30 2000-05-31 contraction   31
# 3 2000-06-30 2000-06-30   expansion    0

This approach is an example of split-apply-combine, where we split our original data based on some property of the data, applied a function to extract information of interest about each piece, and then combined it back together. 这种方法是“拆分应用合并”的一个示例,其中我们根据数据的某些属性拆分原始数据,应用函数提取有关每个片段的感兴趣信息,然后将其重新组合在一起。

Here is the code after my slight modification. 这是我稍加修改后的代码。 Thanks josilber! 谢谢josilber! We usually work on monthly data in cyclical analysis, because dating up to days wouldn't be accurate. 我们通常会在周期性分析中处理月度数据,因为约会长达数天并不准确。 Also the economy can either be in recession/expansion, so there wouldn't be a zero. 同样,经济可能处于衰退/扩张中,因此不会为零。

na_ts = zoo(x=X, order.by=td)

# Split with run-length encoding

runlens <- rle(X)
(ts.spl <- split(na_ts, rep(seq_along(runlens$lengths), times=runlens$lengths)))

dat <- data.frame(start = sapply(ts.spl, start),
                  end = sapply(ts.spl, end),
                  val = ifelse(runlens$values, "contraction", "expansion"))
dat$months<- runlens$lengths
dat

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM