如何从每月入学数据计算持续时间？

Question

I am attempting to take monthly data on enrollment in different programs and turn it into durations/spells for each "idnum". 我试图每月获取有关不同程序注册的数据，并将其转换为每个“ idnum”的持续时间/咒语。 For example: 例如：

row    idnum    date      program
 1     00001    201301    1  
 2     00001    201302    1  
 3     00001    201303    1   
 4     00001    201306    1
 5     00001    201307    1
 6     00002    201301    1
 7     00002    201302    1
 8     00002    201304    1
 9     00002    201305    1
10     00002    201307    1
11     00002    201308    1

"idnum" 00001 is enrolled in program "1" from 201301 to 201303 (duration of 3 months) and from 201306 to 201307 (duration of 2 months). 从201301到201303（历时3个月）和201306到201307（历时2个月）将“ idnum” 00001纳入程序“ 1”。

"idnum" 00002 is enrolled in program "1" from 201301 to 201302 (2 months), from 201304 to 201305 (2 months), and from 201307 to 201308 (2 months). 从2013年1月至201302年（2个月），201304年至201305年（2个月）和201307年至201308年（2个月），“ idnum” 00002进入程序“ 1”。

What I would like to have is a result akin to the following: 我想得到的结果类似于以下内容：

idnum    program    start     duration
00001    1          201301    3
00001    1          201306    2
00002    1          201301    2
00002    1          201304    2
00002    1          201307    2

Any help would be greatly appreciated! 任何帮助将不胜感激！ Thanks in advance for your advice. 预先感谢您的建议。

Answer 1

I like using the data.table package when trying to compare different rows. 当尝试比较不同的行时，我喜欢使用data.table包。

require("data.table")

#Create fake data
data <- data.table(row=1:11,idnum=c(rep("00001",5),rep("00002",6)),
          date=c(201301:201303,201306:201307,201301:201302,201304:201305,201307:201308),program=1)

#Develop order by id
data[,order:=seq(1,.N),by="idnum"]

#Calculate date difference within each id for each date
data[,date.diff:=(date[order+1]-date[order]),by="idnum"]

#Lag date diff
data[,date.diff2:=c(NA,date.diff[-.N])]
data[is.na(date.diff2),date.diff2:=0]

#Develop new start variable to account for interuptions within id
data[,new.start:=1*(date.diff2>1)]
data[,start.group:=cumsum(new.start)]

#Develop result
result <- data[,list(idnum=idnum[1],program=program[1],start=date[1],duration=(date[.N]-date[1]+1)),
     by=c("idnum","start.group")]

result
idnum   start.group idnum  program  start duration
1: 00001           0 00001       1 201301        3
2: 00001           1 00001       1 201306        2
3: 00002           1 00002       1 201301        2
4: 00002           2 00002       1 201304        2
5: 00002           3 00002       1 201307        2

Answer 2

It's not super easy in base R to aggregate by different functions, and your date values are super easy to find gaps in that form, but here's one strategy 在base R中通过不同的函数进行汇总并不是一件容易的事，而且您的日期值也很容易找到这种形式的差距，但这是一种策略

#helper function to convert date to continuous integer
tomonthseq<-function(x) {
x<-as.character(x)
    yr<-as.numeric(substr(x,1,4))
    mo<-as.numeric(substr(x,5,6))
    ryr<-min(yr)
    mo+(yr-ryr)*12
}

#find ID for each id/prog/date seq
sq<-with(dd,ave(tomonthseq(as.character(date)), idnum, program, FUN=function(x) {
    cumsum(c(0, diff(x)!=1))
}))

#do the aggregation
xx <- aggregate(date~sq+program+idnum, dd, 
    function(x) {cbind(x[1], length(x))})

#clean up after aggregation mess
xx <- cbind(xx[, 3:2], 
    structure(xx[, 4], .Dimnames=list(NULL, c("start","duration"))))

And the output is 输出是

  idnum program  start duration
1 00001       1 201301        3
2 00001       1 201306        2
3 00002       1 201301        2
4 00002       1 201304        2
5 00002       1 201307        2

如何从每月入学数据计算持续时间？

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-05-24 10:25:13

解决方案2
0 2014-05-24 05:17:49

如何从每月入学数据计算持续时间？

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-05-24 10:25:13

解决方案2 0 2014-05-24 05:17:49

解决方案1
1 已采纳 2014-05-24 10:25:13

解决方案2
0 2014-05-24 05:17:49