简体   繁体   English

用于纵向数据的R条件变量

[英]R conditional variable for longitudinal data

I have data for each month for a year on insured people. 我有一年的每个月有关被保险人的数据。 All variables are dummy variables and I need to create a new variable that shows when a person became uninsured. 所有变量都是伪变量,我需要创建一个新变量来显示一个人没有保险的时间。 I am calling the variable duration. 我称可变持续时间。 My dataset (df) looks something like this: 我的数据集(df)看起来像这样:

ID Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ID一月二月三月四月五月六月七月八月九月十月十一月十二月

101 1 1 1 1 0 0 1 1 1 1 1 1 101 1 1 1 1 0 0 1 1 1 1 1 1

102 1 1 1 1 0 0 0 0 0 0 0 0 102 1 1 1 1 0 0 0 0 0 0 0 0

103 1 1 1 1 1 1 1 1 1 1 1 1 103 1 1 1 1 1 1 1 1 1 1 1 1

104 1 1 1 1 0 1 1 0 1 1 1 1 104 1 1 1 1 0 1 1 0 1 1 1 1

In the dataset, 1 is insured and 0 is uninsured.My new variable would have the have the col position for when the person changed from 1 to 0. For instance in the first row, my variable duration would have the value 5 for may. 在数据集中,有1个保险,有0个没有保险。当人从1变为0时,我的新变量将具有col位置。例如,在第一行中,我的变量持续时间的值可能为5。 I am only insterested in the first instance of 0. For example, in row 4, i only need 5 for may and can ignore august. 我只对0的第一个实例感兴趣。例如,在第4行中,may只需要5,并且可以忽略八月。 Also, if the person does not become uninsured like in the case of 103, the new variable would just have the value "0". 同样,如果此人没有像103那样变得没有保险,则新变量将仅具有值“ 0”。

I began by using ifelse statement below but it would take me a lot of time to keep repeating it. 我从下面使用ifelse语句开始,但是要花很多时间才能重复它。 if you have an easier solution for this, please share. 如果您有一个更简单的解决方案,请分享。 Thanks! 谢谢!

df$duration=ifelse(df$feb==1,0,2) DF $持续时间= ifelse(DF $二月== 1,0,2)

Another idea that seems valid: 另一个似乎有效的想法:

tmp = !DF[-1]
max.col(tmp, "first") * as.logical(rowSums(tmp))
#[1] 5 5 0 5

Where 哪里

DF = structure(list(ID = 101:104, Jan = c(1L, 1L, 1L, 1L), Feb = c(1L, 
1L, 1L, 1L), Mar = c(1L, 1L, 1L, 1L), Apr = c(1L, 1L, 1L, 1L), 
    May = c(0L, 0L, 1L, 0L), Jun = c(0L, 0L, 1L, 1L), Jul = c(1L, 
    0L, 1L, 1L), Aug = c(1L, 0L, 1L, 0L), Sep = c(1L, 0L, 1L, 
    1L), Oct = c(1L, 0L, 1L, 1L), Nov = c(1L, 0L, 1L, 1L), Dec = c(1L, 
    0L, 1L, 1L)), .Names = c("ID", "Jan", "Feb", "Mar", "Apr", 
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame", row.names = c(NA, 
-4L))

There are more efficient alternatives, but maybe this is sufficient: 有更有效的替代方法,但这也许足够了:

apply(DF[,-1], 1, function(x) which(x==0)[1])
#[1]  5  5 NA  5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM