简体   繁体   中英

How to split a range into minute long parts and transform it from wide to long format in R?

I have seen quite many posts where the data format is transformed from wide to long. Though my question is a little bit more complicated. I have got a dataframe like this:

id  Start_time_of_conversation End_time_of_conversation Participant1 Participant2
id1        07:00                      08:00                     A           B
id2        07:00                      09:00                     C           D

I would like to convert this dataframe into something like this:

id  time(in min)   dataName        dataValue
id1  07:00        Participant1        A
id1  07:00        Participant2        B
id2  07:00        Participant1        C
id2  07:00        Participant2        D
id1  07:01        Participant1        A
id1  07:01        Participant2        B
id2  07:01        Participant1        C
id2  07:01        Participant2        D
...
id1  08:59        Participant1        A
id1  08:59        Participant2        B
id1  09:00        Participant1        A
id1  09:00        Participant2        B

So not only change it to long format, but also generate a new line (two lines with the 2 Participants) for each minute between the time range Start_time_of_conversation and End_time_of_conversation.

I was thinking, that maybe I should use seq() and melt() to do it, though I really do not see, how could I do it without a lot of patchwork. Should I create first a long format, and than convert each range/line into a sequence of minutes, or is there an easier way to do it?

One option would be using data.table . Convert the 'data.frame' to 'data.table' ( setDT(df1) , grouped by 'id' , 'Participant1' and 'Participant2', we get the sequence of 'Datetime' columns (after converting to POSIXlt class with strptime ).

library(data.table)
DT <- setDT(df1)[, 
  list(time_in_mins =format(seq(strptime(Start_time_of_conversation,
    format="%H:%M"), 
   strptime(End_time_of_conversation, format="%H:%M"), by = "1 min"),
      "%H:%M")) , .(id,Participant1, Participant2)]

Using the above output, we do melt to convert the 'Participant' columns to 'long' format, and order if necessary.

melt(DT, id.var=c("id", "time_in_mins"), 
   variable.name= "dataName", 
   value.name= "dataValue")[order(time_in_mins, id, dataName)]
#      id time_in_mins     dataName dataValue
#  1: id1        07:00 Participant1         A
#  2: id1        07:00 Participant2         B
#  3: id2        07:00 Participant1         C
#  4: id2        07:00 Participant2         D
#  5: id1        07:01 Participant1         A
# ---                                        
#360: id2        08:58 Participant2         D
#361: id2        08:59 Participant1         C
#362: id2        08:59 Participant2         D
#363: id2        09:00 Participant1         C
#364: id2        09:00 Participant2         D

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM