简体   繁体   English

在R中重新记录时间序列数据

[英]recoding time series data in r

I am trying to recoding a existing data with a overtime structure. 我正在尝试使用超时结构来重新编码现有数据。 My dataset looks like this: 我的数据集如下所示:

dput(z)

structure(list(democracy = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), year.x = 1967:2008, time = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 
41, 42)), .Names = c("democracy", "year.x", "time"), row.names = 176:217, class = "data.frame")

So that I want to create a new variable, say, time.democ, which takes the value of zero if democracy==0 but start counting the time period again, starting from 1, if democracy ==1 , until democracy==0 again. 因此,我想创建一个新的变量,例如time.democ,如果democracy==0 ,则取值为零,但如果从democracy ==1 ,则从1开始,直到democracy==0 ,然后重新开始计算时间段。再次。 I'm going to do it for a series of countries but I am assuming thr generalization is easy enough using ddply if once I get this function right. 我将在一系列国家/地区进行此操作,但是我假设一旦正确使用此功能,使用ddply进行泛化就足够容易了。 Any suggestions? 有什么建议么?

I would like to get this: 我想得到这个:

dput(z)

structure(list(democracy = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), year.x = 1967:2008, time = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 
41, 42), new.time = c(0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25)), .Names = c("democracy", 
"year.x", "time", "new.time"), row.names = 176:217, class = "data.frame")

Thanks! 谢谢!

You can use rle combined with sequence to do this. 您可以结合使用rlesequence来执行此操作。 rle performs run length encoding, while sequence generates sequences. rle执行游程长度编码,而sequence生成序列。

z$new.time <- sequence(rle(z$democracy)$lengths)
z$new.time[z$democracy==0] <- 0

head(z, 20)

    democracy year.x time new.time
176         0   1967    1        0
177         0   1968    2        0
178         0   1969    3        0
179         0   1970    4        0
180         0   1971    5        0
181         0   1972    6        0
182         1   1973    7        1
183         1   1974    8        2
184         1   1975    9        3
185         0   1976   10        0
186         0   1977   11        0
187         0   1978   12        0
188         0   1979   13        0
189         0   1980   14        0
190         0   1981   15        0
191         0   1982   16        0
192         1   1983   17        1
193         1   1984   18        2
194         1   1985   19        3
195         1   1986   20        4

Thanks for your replies. 多谢您的回覆。 I followed your suggestions and I end up writing a function so that I can apply this to all units in my (longitudinal) data set via ddply. 我遵循了您的建议,最后编写了一个函数,以便可以通过ddply将其应用于我的(纵向)数据集中的所有单元。 I am posting it as it might help some else, though I am sure there are more elegant solutions: 我发布它是因为它可能会帮助其他人,尽管我确信还有更优雅的解决方案:

# is a long format data frame
new.time <- function(a){
    a <- a[order(a$year.x),]
    a$new.time <- sequence(rle(a$democracy)$lengths)-1
    a$new.time[a$democracy==0] <- 0
    return(a)
}

merged1 <- ddply(merged, .(country.x), new.time)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM