简体   繁体   中英

Event data from rows to columns by ID in R, Reshape?

I have several thousands of rows of data in this form:

a= c("id", "start", "mid1", "mid2", "finish")
b= c("id1", "date1", "date2", "date3",  "date4")
c= c("id2", "date5", "date6", NA, "date7")
d= c("id3", "date8", "date9", "date10", "date11")

df=as.data.frame(rbind(b,c,d))
colnames(df)=a
rownames(df)=c(1:nrow(df))

df

#    id start  mid1   mid2 finish
# 1 id1 date1 date2  date3  date4
# 2 id2 date5 date6   <NA>  date7
# 3 id3 date8 date9 date10 date11
# ...

And I would need to have it in this form:

id;  event  ;date
id1; start  ;date1
id1; mid1   ;date2
id1; mid2   ;date3
id1; finish ;date4
id2; start  ;date5
id2; mid1   ;date6 
id2; finish ;date7
id3; start  ;date8
id3; mid1   ;date9  
id3; mid2   ;date10  
id3; finish ;date11
...

I found this question which was almost the same but the other way around: How to transform Columns to rows in R?

How could I accomplish the transformation?

As mentioned in the comments, you can use tidyr::gather . Here I use it in combination with dplyr , and chain it all together with %>% .

library(tidyr); library(dplyr)

df %>%   
    gather(event, date, -id) %>%   
    arrange(id) %>%   
    filter(!is.na(date))

which results in

    id  event   date
1  id1  start  date1
2  id1   mid1  date2
3  id1   mid2  date3
4  id1 finish  date4
5  id2  start  date5
6  id2   mid1  date6
7  id2 finish  date7
8  id3  start  date8
9  id3   mid1  date9
10 id3   mid2 date10
11 id3 finish date11

You need to put NA instead of blank in your original data and as said by Davide use melt , ignoring NA to obtain the result you want:

> df
   id start  mid1   mid2 finish
1 id1 date1 date2  date3  date4
2 id2 date5 date6   <NA>  date7
3 id3 date8 date9 date10 date11

library(reshape2)

melt(df, id.vars="id", variable.name="event",value.name="date",na.rm=TRUE)

For variety's sake, you can do the following in base R:

cbind(df[1], stack(lapply(df[-1], as.character)), row.names = NULL)
#     id values    ind
# 1  id1  date1  start
# 2  id2  date5  start
# 3  id3  date8  start
# 4  id1  date2   mid1
# 5  id2  date6   mid1
# 6  id3  date9   mid1
# 7  id1  date3   mid2
# 8  id2   <NA>   mid2
# 9  id3 date10   mid2
# 10 id1  date4 finish
# 11 id2  date7 finish
# 12 id3 date11 finish

You can wrap it in na.omit if you want to get rid of that NA and use order to get the data in the row order that you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM