[英]Spreading a dataframe in R with multiple custom columns
I'm not sure how to approach this problem and would like some insight. 我不确定如何解决此问题,并希望了解一些信息。 I have multiple owners to a unique ID, but the unique ID is being populated more then once due to multiple owners in the Owners column. 我有一个唯一ID的多个所有者,但是由于“所有者”列中有多个所有者,因此唯一身份ID的填充次数超过一次。 I would like to spread the column if the unique ID has 1 or more owners. 如果唯一ID具有1个或更多所有者,我想扩展此列。 Any help would be greatly appreciated. 任何帮助将不胜感激。 Thanks! 谢谢!
This is what it look like before: 这是以前的样子:
df <- as.data.frame(matrix(NA, nrow = 11, ncol = 3))
df$V1 <- c('A','A','B','C','C','C','D','E','E','E','E')
df$V2 <- c('John','Derek','Sarah','Peter','Carlos','Angela','Ken','James','Nina','Gabby','Seth')
df$V3 <- c(100,90,80,85,66,98,62,74,56,85,77)
colnames(df) <- c('ID','Owner','Score')
This is what I want it look like after: 这是我想要的样子:
df_out <- as.data.frame(matrix(NA,nrow = 5, ncol = 9))
df_out$V1 <- c('A','B','C','D','E')
df_out$V2 <- c('John','Sarah','Peter','Ken','James')
df_out$V3 <- c(100,80,85,62,74)
df_out$V4 <- c('Derek',NA,'Carlos',NA,'Nina')
df_out$V5 <- c(90,NA,66,NA,56)
df_out$V6 <- c(NA,NA,'Angela',NA,'Gabby')
df_out$V7 <- c(NA,NA,98,NA,85)
df_out$V8 <- c(NA,NA,NA,NA,'Seth')
df_out$V9 <- c(NA,NA,NA,NA,77)
colnames(df_out) <- c('ID','Owner','Score','Owner.2','Score.2','Owner.3','Score.3','Owner.4','Score.4')
Please excuse my code, I'm still a beginner! 请原谅我的代码,我仍然是初学者!
Here is an option using data.table::dcast
which does a pivot of ID (your row label) against row number (your column label) using Owner and Score as the values to be pivoted 这是一个使用data.table::dcast
的选项,它使用Owner和Score作为要透视的值,对ID(您的行标签)对行号(您的列标签)进行透视
library(data.table)
setDT(df)[, nr := rowid(ID)]
ans <- dcast(df, ID ~ nr, sep=".", value.var=c("Owner","Score"))
ans
output: 输出:
ID Owner.1 Owner.2 Owner.3 Owner.4 Score.1 Score.2 Score.3 Score.4
1: A John Derek <NA> <NA> 100 90 NA NA
2: B Sarah <NA> <NA> <NA> 80 NA NA NA
3: C Peter Carlos Angela <NA> 85 66 98 NA
4: D Ken <NA> <NA> <NA> 62 NA NA NA
5: E James Nina Gabby Seth 74 56 85 77
To reorder into your specific column order, you can order the columns using the number indices (ie the .1, .2, .3, etc) in the column names as follows: 要重新排序为特定的列顺序,可以使用列名称中的数字索引(即.1,.2,.3等)对列进行排序,如下所示:
nm <- names(ans)[-1L]
cols <- nm[order(sapply(strsplit(nm, "\\."), `[`, 2))]
setcolorder(ans, c("ID", cols))
ans
output: 输出:
ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
1: A John 100 Derek 90 <NA> NA <NA> NA
2: B Sarah 80 <NA> NA <NA> NA <NA> NA
3: C Peter 85 Carlos 66 Angela 98 <NA> NA
4: D Ken 62 <NA> NA <NA> NA <NA> NA
5: E James 74 Nina 56 Gabby 85 Seth 77
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>%
#First collect all Owners and Scores for each ID in one place
summarise(own=paste0(Owner,collapse = ','),sco=paste0(Score,collapse = ',')) %>%
#Separate Owners to their specifc columns using tidyr::separate
separate(own,into = c('Owner.1','Owner.2','Owner.3','Owner.4')) %>%
separate(sco, into=c('Score.1','Score.2','Score.3','Score.4')) %>%
#Rearrange column names as in OP
select(ID, Owner.1, Score.1, Owner.2, Score.2, Owner.3, Score.3, Owner.4, Score.4)
# A tibble: 5 x 9
ID Owner.1 Score.1 Owner.2 Score.2 Owner.3 Score.3 Owner.4 Score.4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A John 100 Derek 90 NA NA NA NA
2 B Sarah 80 NA NA NA NA NA NA
3 C Peter 85 Carlos 66 Angela 98 NA NA
4 D Ken 62 NA NA NA NA NA NA
5 E James 74 Nina 56 Gabby 85 Seth 77
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.