I have several thousands of rows of data in this form:
a= c("id", "start", "mid1", "mid2", "finish")
b= c("id1", "date1", "date2", "date3", "date4")
c= c("id2", "date5", "date6", NA, "date7")
d= c("id3", "date8", "date9", "date10", "date11")
df=as.data.frame(rbind(b,c,d))
colnames(df)=a
rownames(df)=c(1:nrow(df))
df
# id start mid1 mid2 finish
# 1 id1 date1 date2 date3 date4
# 2 id2 date5 date6 <NA> date7
# 3 id3 date8 date9 date10 date11
# ...
And I would need to have it in this form:
id; event ;date
id1; start ;date1
id1; mid1 ;date2
id1; mid2 ;date3
id1; finish ;date4
id2; start ;date5
id2; mid1 ;date6
id2; finish ;date7
id3; start ;date8
id3; mid1 ;date9
id3; mid2 ;date10
id3; finish ;date11
...
I found this question which was almost the same but the other way around: How to transform Columns to rows in R?
How could I accomplish the transformation?
As mentioned in the comments, you can use tidyr::gather
. Here I use it in combination with dplyr
, and chain it all together with %>%
.
library(tidyr); library(dplyr)
df %>%
gather(event, date, -id) %>%
arrange(id) %>%
filter(!is.na(date))
which results in
id event date
1 id1 start date1
2 id1 mid1 date2
3 id1 mid2 date3
4 id1 finish date4
5 id2 start date5
6 id2 mid1 date6
7 id2 finish date7
8 id3 start date8
9 id3 mid1 date9
10 id3 mid2 date10
11 id3 finish date11
You need to put NA
instead of blank in your original data and as said by Davide use melt
, ignoring NA
to obtain the result you want:
> df
id start mid1 mid2 finish
1 id1 date1 date2 date3 date4
2 id2 date5 date6 <NA> date7
3 id3 date8 date9 date10 date11
library(reshape2)
melt(df, id.vars="id", variable.name="event",value.name="date",na.rm=TRUE)
For variety's sake, you can do the following in base R:
cbind(df[1], stack(lapply(df[-1], as.character)), row.names = NULL)
# id values ind
# 1 id1 date1 start
# 2 id2 date5 start
# 3 id3 date8 start
# 4 id1 date2 mid1
# 5 id2 date6 mid1
# 6 id3 date9 mid1
# 7 id1 date3 mid2
# 8 id2 <NA> mid2
# 9 id3 date10 mid2
# 10 id1 date4 finish
# 11 id2 date7 finish
# 12 id3 date11 finish
You can wrap it in na.omit
if you want to get rid of that NA
and use order
to get the data in the row order that you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.