简体   繁体   中英

Recode the first row for each group in R?

I'm trying to understand how I can recode the first row for a group in R. I understand how to grab the group and I thought with a quick ifelse statement I could do this but I think I am approaching this wrong. Here is the sample:

library(data.table)


latemail <- function(N, st="2012/01/01", et="2012/02/01") {
  st <- as.POSIXct(as.Date(st))
  et <- as.POSIXct(as.Date(et))
  dt <- as.numeric(difftime(et,st,unit="sec"))
  ev <- sort(runif(N, 0, dt))
  rt <- st + ev

}

#create our data frame
set.seed(42)
dt = latemail(20)
work = setDT(as.data.frame(dt))
work[,worker:= stringi::stri_rand_strings(2, 5)]  
work[,dt:= as.POSIXct(as.character(work$dt), tz = "GMT")]
work[,status:=NA]

#order
setorder(work, worker, dt)

#add work times
work$status[5] = "end"
work$status[10] = "end"
work$status[15] = "end"
work$status[20] = "end"

I am looking for the final product to look like this, essentially taking every first row for the worker group and coding it start, as well as every row after a conscutive "end":

              dt worker status
 1: 2012-01-04 23:11:31  VOuRp  start
 2: 2012-01-09 15:53:16  VOuRp     NA
 3: 2012-01-15 02:56:45  VOuRp     NA
 4: 2012-01-16 21:12:26  VOuRp     NA
 5: 2012-01-20 16:27:31  VOuRp    end
 6: 2012-01-22 15:34:05  VOuRp  start
 7: 2012-01-23 15:01:18  VOuRp     NA
 8: 2012-01-29 03:36:56  VOuRp     NA
 9: 2012-01-29 20:11:02  VOuRp     NA
10: 2012-01-31 02:48:01  VOuRp    end
11: 2012-01-04 10:24:38  u8zw5  start
12: 2012-01-08 17:02:20  u8zw5     NA
13: 2012-01-14 23:33:35  u8zw5     NA
14: 2012-01-15 12:23:52  u8zw5     NA
15: 2012-01-18 03:53:15  u8zw5    end
16: 2012-01-21 03:48:08  u8zw5  start
17: 2012-01-23 02:01:10  u8zw5     NA
18: 2012-01-26 12:51:10  u8zw5     NA
19: 2012-01-29 18:23:46  u8zw5     NA
20: 2012-01-29 22:22:14  u8zw5    end

How would I approach this, preferably in data tables?

You can use some base R in the i argument to select the rows and then use := to assign "start" values.

work[c(1, head(which(status == "end" & !is.na(status)) + 1, -1)), status := "start"]

Here, c(1, head(which(status == "end" & !is.na(status) + 1), -1)) returns a vector of integers with the positions to be filled. which selects positions that match "end" and are not missing values. The + 1 increments these values. head with a -1 argument is used to drop the final position as it lies outside of the data.table.

This returns

work
                     dt worker status
 1: 2012-01-04 23:11:31  VOuRp  start
 2: 2012-01-09 15:53:16  VOuRp     NA
 3: 2012-01-15 02:56:45  VOuRp     NA
 4: 2012-01-16 21:12:26  VOuRp     NA
 5: 2012-01-20 16:27:31  VOuRp    end
 6: 2012-01-22 15:34:05  VOuRp  start
 7: 2012-01-23 15:01:18  VOuRp     NA
 8: 2012-01-29 03:36:56  VOuRp     NA
 9: 2012-01-29 20:11:02  VOuRp     NA
10: 2012-01-31 02:48:01  VOuRp    end
11: 2012-01-04 10:24:38  u8zw5  start
12: 2012-01-08 17:02:20  u8zw5     NA
13: 2012-01-14 23:33:35  u8zw5     NA
14: 2012-01-15 12:23:52  u8zw5     NA
15: 2012-01-18 03:53:15  u8zw5    end
16: 2012-01-21 03:48:08  u8zw5  start
17: 2012-01-23 02:01:10  u8zw5     NA
18: 2012-01-26 12:51:10  u8zw5     NA
19: 2012-01-29 18:23:46  u8zw5     NA
20: 2012-01-29 22:22:14  u8zw5    end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM