简体   繁体   English

通过R中的ID从行到列的事件数据是否重塑?

[英]Event data from rows to columns by ID in R, Reshape?

I have several thousands of rows of data in this form: 我有数千行这种形式的数据:

a= c("id", "start", "mid1", "mid2", "finish")
b= c("id1", "date1", "date2", "date3",  "date4")
c= c("id2", "date5", "date6", NA, "date7")
d= c("id3", "date8", "date9", "date10", "date11")

df=as.data.frame(rbind(b,c,d))
colnames(df)=a
rownames(df)=c(1:nrow(df))

df

#    id start  mid1   mid2 finish
# 1 id1 date1 date2  date3  date4
# 2 id2 date5 date6   <NA>  date7
# 3 id3 date8 date9 date10 date11
# ...

And I would need to have it in this form: 我需要以这种形式使用它:

id;  event  ;date
id1; start  ;date1
id1; mid1   ;date2
id1; mid2   ;date3
id1; finish ;date4
id2; start  ;date5
id2; mid1   ;date6 
id2; finish ;date7
id3; start  ;date8
id3; mid1   ;date9  
id3; mid2   ;date10  
id3; finish ;date11
...

I found this question which was almost the same but the other way around: How to transform Columns to rows in R? 我发现这个问题几乎是相同的,但反过来又一样: 如何将列转换为R中的行?

How could I accomplish the transformation? 我怎样才能完成转型?

As mentioned in the comments, you can use tidyr::gather . 如评论中所述,您可以使用tidyr::gather Here I use it in combination with dplyr , and chain it all together with %>% . 在这里,我将其与dplyr结合使用,并将它们与%>%链接在一起。

library(tidyr); library(dplyr)

df %>%   
    gather(event, date, -id) %>%   
    arrange(id) %>%   
    filter(!is.na(date))

which results in 导致

    id  event   date
1  id1  start  date1
2  id1   mid1  date2
3  id1   mid2  date3
4  id1 finish  date4
5  id2  start  date5
6  id2   mid1  date6
7  id2 finish  date7
8  id3  start  date8
9  id3   mid1  date9
10 id3   mid2 date10
11 id3 finish date11

You need to put NA instead of blank in your original data and as said by Davide use melt , ignoring NA to obtain the result you want: 您需要在原始数据中放入NA而不是空白,并且正如Davide所说,使用melt忽略NA以获得所需的结果:

> df
   id start  mid1   mid2 finish
1 id1 date1 date2  date3  date4
2 id2 date5 date6   <NA>  date7
3 id3 date8 date9 date10 date11

library(reshape2)

melt(df, id.vars="id", variable.name="event",value.name="date",na.rm=TRUE)

For variety's sake, you can do the following in base R: 出于多样性的考虑,您可以在base R中执行以下操作:

cbind(df[1], stack(lapply(df[-1], as.character)), row.names = NULL)
#     id values    ind
# 1  id1  date1  start
# 2  id2  date5  start
# 3  id3  date8  start
# 4  id1  date2   mid1
# 5  id2  date6   mid1
# 6  id3  date9   mid1
# 7  id1  date3   mid2
# 8  id2   <NA>   mid2
# 9  id3 date10   mid2
# 10 id1  date4 finish
# 11 id2  date7 finish
# 12 id3 date11 finish

You can wrap it in na.omit if you want to get rid of that NA and use order to get the data in the row order that you want. 如果要摆脱该NA并使用order以所需的行顺序获取数据,可以将其包装在na.omit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM