简体   繁体   English

如何扩展列值(成对)以扩展 R 中的数据框

[英]How to expand the column values (pairwise) to expand the dataframe in R

I have a dataframe called mydf .我有一个名为mydf的数据mydf I want to expand this data frame in such a way that I get pairwise combination of samples for all the column values separated by ":" and get the result as shown below:我想以这样的方式扩展此数据框,以便为所有以“:”分隔的列值获得样本的成对组合,并获得如下所示的结果:

mydf<-structure(list(Sample = c("1749742002_A", "1749742086_A", "1749742184_A"
), Call.Rate = c(0.9995, 0.9992, 0.999), Study = c(133, 133, 
133), Ethnicity = c("Adygei", "Maya", "Adygei")), .Names = c("Sample", 
"Call.Rate", "Study", "Ethnicity"), row.names = c(NA, 3L), class = "data.frame")

result结果

Sample                         Call.Rate        Study     Ethnicity
 1749742002_A:1749742086_A    0.9995:0.9992   133:133   Adygei:Maya
 1749742086_A:1749742002_A    0.9992:0.9995   133:133   Maya:Adygei
 1749742086_A:1749742184_A    0.9992:0.9990   133:133   Maya:Adygei
 1749742184_A:1749742002_A    0.9990:0.9995   133:133   Adygei:Adygei

and so on..等等..

We can use我们可以用

data.frame(lapply(mydf, function(x) if(length(unique(x)) >1) 
         do.call(paste, c(expand.grid(x,x), sep=":"))
         else paste(x[1], x[1], sep=":")))

Here is a version, using intermediate dataframes so that it is easier to understand the steps The final sort may need to be tweaked if order is important这是一个版本,使用中间数据帧,以便更容易理解步骤如果顺序很重要,可能需要调整最终排序

# slightly cleaner data creation 
mydf<-data.frame(
  Sample = c("1749742002_A", "1749742086_A", "1749742184_A"), 
  Call.Rate = c(0.9995, 0.9992, 0.999), 
  Study = c(133, 133,  133), 
  Ethnicity = c("Adygei", "Maya", "Adygei"))

require(dplyr)

# use dplyr::lead to create a dataframe offset by 1 row
# and prefix the column names with "y.
mydf_lead <- data.frame(lapply(mydf, lead)) 
names(mydf_lead) <- paste0("y.", names(mydf))

# cbind the original wwith the lead df                  
mydf2 <- cbind(mydf, mydf_lead) %>% filter(!is.na(y.Sample))

# create the a:b and b:a variations as seprate data frames -- fix oclumn names
mydf_ab <- data.frame(lapply(1:ncol(mydf), function(i) {paste(mydf2[,i], ":", mydf2[,i+ncol(mydf)])}))
mydf_ba <- data.frame(lapply(1:ncol(mydf), function(i) {paste(mydf2[,i+ncol(mydf)], ":", mydf2[,i])}))
names(mydf_ab) <- names(mydf_ba) <- names(mydf)

# rbind the results, and sort 
result <- rbind(mydf_ab, mydf_ba) %>% 
  arrange(Sample)

result

Sample       Call.Rate     Study     Ethnicity
1 1749742002_A : 1749742086_A 0.9995 : 0.9992 133 : 133 Adygei : Maya
2 1749742086_A : 1749742184_A  0.9992 : 0.999 133 : 133 Maya : Adygei
3 1749742086_A : 1749742002_A 0.9992 : 0.9995 133 : 133 Maya : Adygei
4 1749742184_A : 1749742086_A  0.999 : 0.9992 133 : 133 Adygei : Maya

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM