简体   繁体   English

使用多个自定义列在R中传播数据框

[英]Spreading a dataframe in R with multiple custom columns

I'm not sure how to approach this problem and would like some insight. 我不确定如何解决此问题,并希望了解一些信息。 I have multiple owners to a unique ID, but the unique ID is being populated more then once due to multiple owners in the Owners column. 我有一个唯一ID的多个所有者,但是由于“所有者”列中有多个所有者,因此唯一身份ID的填充次数超过一次。 I would like to spread the column if the unique ID has 1 or more owners. 如果唯一ID具有1个或更多所有者,我想扩展此列。 Any help would be greatly appreciated. 任何帮助将不胜感激。 Thanks! 谢谢!

This is what it look like before: 这是以前的样子:

df <- as.data.frame(matrix(NA, nrow = 11, ncol = 3))
df$V1 <- c('A','A','B','C','C','C','D','E','E','E','E')
df$V2 <- c('John','Derek','Sarah','Peter','Carlos','Angela','Ken','James','Nina','Gabby','Seth')
df$V3 <- c(100,90,80,85,66,98,62,74,56,85,77)
colnames(df) <- c('ID','Owner','Score')

This is what I want it look like after: 这是我想要的样子:

df_out <- as.data.frame(matrix(NA,nrow = 5, ncol = 9))
df_out$V1 <- c('A','B','C','D','E')
df_out$V2 <- c('John','Sarah','Peter','Ken','James')
df_out$V3 <- c(100,80,85,62,74)
df_out$V4 <- c('Derek',NA,'Carlos',NA,'Nina')
df_out$V5 <- c(90,NA,66,NA,56)
df_out$V6 <- c(NA,NA,'Angela',NA,'Gabby')
df_out$V7 <- c(NA,NA,98,NA,85)
df_out$V8 <- c(NA,NA,NA,NA,'Seth')
df_out$V9 <- c(NA,NA,NA,NA,77)
colnames(df_out) <- c('ID','Owner','Score','Owner.2','Score.2','Owner.3','Score.3','Owner.4','Score.4')

Please excuse my code, I'm still a beginner! 请原谅我的代码,我仍然是初学者!

Here is an option using data.table::dcast which does a pivot of ID (your row label) against row number (your column label) using Owner and Score as the values to be pivoted 这是一个使用data.table::dcast的选项,它使用Owner和Score作为要透视的值,对ID(您的行标签)对行号(您的列标签)进行透视

library(data.table)
setDT(df)[, nr := rowid(ID)]
ans <- dcast(df, ID ~ nr, sep=".", value.var=c("Owner","Score"))
ans

output: 输出:

   ID Owner.1 Owner.2 Owner.3 Owner.4 Score.1 Score.2 Score.3 Score.4
1:  A    John   Derek    <NA>    <NA>     100      90      NA      NA
2:  B   Sarah    <NA>    <NA>    <NA>      80      NA      NA      NA
3:  C   Peter  Carlos  Angela    <NA>      85      66      98      NA
4:  D     Ken    <NA>    <NA>    <NA>      62      NA      NA      NA
5:  E   James    Nina   Gabby    Seth      74      56      85      77

To reorder into your specific column order, you can order the columns using the number indices (ie the .1, .2, .3, etc) in the column names as follows: 要重新排序为特定的列顺序,可以使用列名称中的数字索引(即.1,.2,.3等)对列进行排序,如下所示:

nm <- names(ans)[-1L]
cols <- nm[order(sapply(strsplit(nm, "\\."), `[`, 2))]
setcolorder(ans, c("ID", cols))
ans

output: 输出:

   ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
1:  A    John     100   Derek      90    <NA>      NA    <NA>      NA
2:  B   Sarah      80    <NA>      NA    <NA>      NA    <NA>      NA
3:  C   Peter      85  Carlos      66  Angela      98    <NA>      NA
4:  D     Ken      62    <NA>      NA    <NA>      NA    <NA>      NA
5:  E   James      74    Nina      56   Gabby      85    Seth      77
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>% 
       #First collect all Owners and Scores for each ID in one place  
       summarise(own=paste0(Owner,collapse = ','),sco=paste0(Score,collapse = ',')) %>%  
       #Separate Owners to their specifc columns using tidyr::separate
       separate(own,into = c('Owner.1','Owner.2','Owner.3','Owner.4')) %>% 
       separate(sco, into=c('Score.1','Score.2','Score.3','Score.4'))  %>%
       #Rearrange column names as in OP 
       select(ID, Owner.1, Score.1, Owner.2, Score.2, Owner.3, Score.3, Owner.4,  Score.4)


# A tibble: 5 x 9
ID    Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
<chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 A     John    100     Derek   90      NA      NA      NA      NA     
2 B     Sarah   80      NA      NA      NA      NA      NA      NA     
3 C     Peter   85      Carlos  66      Angela  98      NA      NA     
4 D     Ken     62      NA      NA      NA      NA      NA      NA     
5 E     James   74      Nina    56      Gabby   85      Seth    77

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM