简体   繁体   中英

Identifying breaks in consecutive values in R

I have a data frame in R similar to the one below where the columns are year- and week number, and every row is a specific person. To get the relevant input data on the specific ID's I have an indicator of, whether the person was unemployed in 2015 or 2016 IND15 and IND16 . If the observation is '1' the person is unemployed, and if the observation is '0', the person is employed:

ID  y12_01  y12_02  y12_03  y12_04... y12_51  y12_52 y13_01 IND12 IND13  
01    1       1       1       0         0       1        1    1    1   
02    1       1       1       1         1       1        1    1    1   
03    0       0       1       1         0       0        1    1    1   

As you see in the examples above, some of the rows shows unemployment in both 2012 and 2013. If the person has a sequence only of unemployment (only 1) beginning in 2015, I would like to create an indicator of this, and if the person has a 'break' in the sequence (ie ID01 or ID03), I would like to create an indicator of this.

I suspect part of the solution could include rowSums or a while-loop, but I have not had any luck so far. In SAS I think one would perhaps be able to use the array function, but once again I am not quite sure of how this would be done in R-language.

For the first part of the question, try df[df$IND15 == 1 & df$IND16 == 1, "Indicator1"] <- 1 .

For the second part, you should be able to do it with a for loop:

for (ID in df$ID){
  if (any(df[ID, 1:x]) == 0){
    df[ID, "Indicator2"] <- 1
  }
}

If you wish to retain the wide format, one way to create the indicator would be to multiply the columns. Using the following example data,

d <- read.table(text = "ID  y12_01  y12_02  y12_03  y12_04  y12_51  y12_52 y13_01 IND15 IND16  
01    1       1       1       0         0       1        1    1    1   
02    1       1       1       1         1       1        1    1    1   
03    0       0       1       1         0       0        1    1    1", 
  header = TRUE, stringsAsFactors = FALSE)

where the relevant columns are assumed to be columns 2 to 7, and the values are assumed to be numeric, we can create an indic column

d$indic <- Reduce(`*`, d[, 2:7])
d
#   ID y12_01 y12_02 y12_03 y12_04 y12_51 y12_52 y13_01 IND15 IND16 indic
# 1  1      1      1      1      0      0      1      1     1     1     0
# 2  2      1      1      1      1      1      1      1     1     1     1
# 3  3      0      0      1      1      0      0      1     1     1     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM