简体   繁体   中英

R : Repeat rows in a data frame with new set of values for certain columns

employee <- c("John", "Adi", "Sam")
salary <- c(21000, 22000, 23000)
startdate <- as.Date(c("2014-11-01","2014-01-01","2014-10-01"))
enddate <- as.Date(c("2015-10-31","2014-12-31","2015-10-31"))
N<- c(2,1,2)
df<- data.frame(employee,salary, startdate, enddate, N)

I want to repeat the entire row by "n" number of times where "n" is specified in the column N but I want to change the enddate in the original row to be fixed date such as "31/12/2014" and make this fixed date as startdate in the repeating row. Run the code to see the resultant example (expected output)in df2 :

employee <- c(rep("John",2), "Adi", rep("Sam",2))
salary <- c(21000,21000, 22000, 23000,23000)
startdate <- as.Date(c("2014-11-01","2014-12-31", "2014-01-01","2014-10-01","2014-12-31"))
enddate <- as.Date(c("2014-12-31","2015-10-31","2014-12-31","2014-12-31","2015-10-31"))
N<- c(2,2,1,2,2)
df2<- data.frame(employee,salary, startdate, enddate, N)

We could do this using data.table . We convert the 'data.frame' to 'data.table' ( setDT(df) ), expand the rows by replicating the 'N' variable. We get the numeric index ('i1') of the observation ( .I[1L] ) grouped by 'employee', use that to assign ( := ) the 'enddate' with '2014-12-31'. Similarly, we get the row index ('i2') of 2nd to last element ( .I[seq_len(.N)>1L] ) for each 'employee' and set the 'startdate' as '2014-12-31'.

DT <- setDT(df)[rep(seq_len(.N), N)]
i1 <- DT[,  .I[1L] , by = employee]$V1
DT[i1, enddate:= as.Date('2014-12-31')]
i2 <- DT[, .I[seq_len(.N)>1L], employee]$V1
DT[i2, startdate:= as.Date('2014-12-31')]
identical(as.data.table(df2), DT)
#[1] TRUE

Or we can do this using if and concatenate the '2014-12-31' for the 'startdate' and 'enddate', grouped by 'employee' and then assign the output back to the columns 'startdate', 'enddate'.

DT[, c('startdate', 'enddate') := if(.N>1L) 
            list(c(startdate[1L], as.Date('2014-12-31')),
                 c(as.Date('2014-12-31'), enddate[.N])) , by = employee]
identical(DT, as.data.table(df2))
#[1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM