[英]Event data from rows to columns by ID in R, Reshape?
I have several thousands of rows of data in this form: 我有数千行这种形式的数据:
a= c("id", "start", "mid1", "mid2", "finish")
b= c("id1", "date1", "date2", "date3", "date4")
c= c("id2", "date5", "date6", NA, "date7")
d= c("id3", "date8", "date9", "date10", "date11")
df=as.data.frame(rbind(b,c,d))
colnames(df)=a
rownames(df)=c(1:nrow(df))
df
# id start mid1 mid2 finish
# 1 id1 date1 date2 date3 date4
# 2 id2 date5 date6 <NA> date7
# 3 id3 date8 date9 date10 date11
# ...
And I would need to have it in this form: 我需要以这种形式使用它:
id; event ;date
id1; start ;date1
id1; mid1 ;date2
id1; mid2 ;date3
id1; finish ;date4
id2; start ;date5
id2; mid1 ;date6
id2; finish ;date7
id3; start ;date8
id3; mid1 ;date9
id3; mid2 ;date10
id3; finish ;date11
...
I found this question which was almost the same but the other way around: How to transform Columns to rows in R? 我发现这个问题几乎是相同的,但反过来又一样: 如何将列转换为R中的行?
How could I accomplish the transformation? 我怎样才能完成转型?
As mentioned in the comments, you can use tidyr::gather
. 如评论中所述,您可以使用tidyr::gather
。 Here I use it in combination with dplyr
, and chain it all together with %>%
. 在这里,我将其与dplyr
结合使用,并将它们与%>%
链接在一起。
library(tidyr); library(dplyr)
df %>%
gather(event, date, -id) %>%
arrange(id) %>%
filter(!is.na(date))
which results in 导致
id event date
1 id1 start date1
2 id1 mid1 date2
3 id1 mid2 date3
4 id1 finish date4
5 id2 start date5
6 id2 mid1 date6
7 id2 finish date7
8 id3 start date8
9 id3 mid1 date9
10 id3 mid2 date10
11 id3 finish date11
You need to put NA
instead of blank in your original data and as said by Davide use melt
, ignoring NA
to obtain the result you want: 您需要在原始数据中放入NA
而不是空白,并且正如Davide所说,使用melt
忽略NA
以获得所需的结果:
> df
id start mid1 mid2 finish
1 id1 date1 date2 date3 date4
2 id2 date5 date6 <NA> date7
3 id3 date8 date9 date10 date11
library(reshape2)
melt(df, id.vars="id", variable.name="event",value.name="date",na.rm=TRUE)
For variety's sake, you can do the following in base R: 出于多样性的考虑,您可以在base R中执行以下操作:
cbind(df[1], stack(lapply(df[-1], as.character)), row.names = NULL)
# id values ind
# 1 id1 date1 start
# 2 id2 date5 start
# 3 id3 date8 start
# 4 id1 date2 mid1
# 5 id2 date6 mid1
# 6 id3 date9 mid1
# 7 id1 date3 mid2
# 8 id2 <NA> mid2
# 9 id3 date10 mid2
# 10 id1 date4 finish
# 11 id2 date7 finish
# 12 id3 date11 finish
You can wrap it in na.omit
if you want to get rid of that NA
and use order
to get the data in the row order that you want. 如果要摆脱该NA
并使用order
以所需的行顺序获取数据,可以将其包装在na.omit
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.