简体   繁体   English

如何分解不同复制的列,条件取的字符值

[英]how to melt down columns of different replications, character values taken by conditions

I have a data.frame that looks like: 我有一个data.frame看起来像:

    mi      pos L08.92.s1 L08.92.s2 LD09.911.s1 LD09.911.s2 Storn.s1 Storn.s2 Storn.s3 Storn.s4 Tre
1 snp1 12713760        CC        CT          CC          CC       TT       TT       TT       TT  CC
2 snp2  8219379        AA        AA          --          AA       AA       AA       AA       AA  --
3 snp3  6595215        GG        GG          GG          GG       GG       --       GG       GT  GT
4 snp4 42348146        CC        CC          CC          CC       CC       CA       --       CA  AA
5 snp5  1809563        GG        GG          TT          TG       GG       GG       GG       GG  TT
6 snp6 34285723        TT        CC          --          --       TT       TT       TT       TT  CC
7 snp7 21533194        AA        AA          AG          --       AA       GG       GG       GG  AG

I expect the final data frame to look like: 我希望最终数据框看起来像:

    mi      pos L08.92 LD09.911 Storn Tre
1 snp1 12713760     CC       CC    TT  CC
2 snp2  8219379     AA       AA    AA  --
3 snp3  6595215     GG       GG    GG  GT
4 snp4 42348146     CC       CC    CC  AA
5 snp5  1809563     GG       TT    GG  TT
6 snp6 34285723     HH       --    TT  CC
7 snp7 21533194     AA       AG    HH  AG

Procedures: columns with replications would be melt down to one column for each sample. 程序:对于每个样品,具有重复性的色谱柱将融化为一根色谱柱。 The value will be taken from replications as the following rules: 该值将按照以下规则从复制中获取:

  • if the values are the same for all replicates, no change; 如果所有重复的值都相同,则保持不变;
  • priority order to accept a value is : two same letters > two different letters > "--"; 接受值的优先顺序是:两个相同的字母>两个不同的字母>“-”;
  • if two types of "homo" values exist among replicates, changed to "HH" 如果副本之间存在两种类型的“均值”值,则更改为“ HH”

Thank you for help! 谢谢你的帮助!

You may try 你可以试试

indx <- gsub("[.][^.]+$", "", colnames(df)[-(1:2)])
lst <- split(colnames(df)[-(1:2)], indx)
Un <- c('AA', 'CC', 'GG', 'TT')

df2 <- df[,1:2]
df2[unique(indx)] <- lapply(lst, function(x)
         apply(df[x], 1, function(y) {y1 <- unique(y)
                  y2 <- y1[y1 %in% Un]
                 ifelse(length(y2)==0, sort(y1, decreasing=TRUE),
                   ifelse(length(y2)==2, 'HH', y2))
               }))

df2
#    mi      pos L08.92 LD09.911 Storn Tre
#1 snp1 12713760     CC       CC    TT  CC
#2 snp2  8219379     AA       AA    AA  --
#3 snp3  6595215     GG       GG    GG  GT
#4 snp4 42348146     CC       CC    CC  AA
#5 snp5  1809563     GG       TT    GG  TT
#6 snp6 34285723     HH       --    TT  CC
#7 snp7 21533194     AA       AG    HH  AG

data 数据

 df <- structure(list(mi = c("snp1", "snp2", "snp3", "snp4", "snp5", 
 "snp6", "snp7"), pos = c(12713760L, 8219379L, 6595215L, 42348146L, 
1809563L, 34285723L, 21533194L), L08.92.s1 = c("CC", "AA", "GG", 
"CC", "GG", "TT", "AA"), L08.92.s2 = c("CT", "AA", "GG", "CC", 
"GG", "CC", "AA"), LD09.911.s1 = c("CC", "--", "GG", "CC", "TT", 
"--", "AG"), LD09.911.s2 = c("CC", "AA", "GG", "CC", "TG", "--", 
"--"), Storn.s1 = c("TT", "AA", "GG", "CC", "GG", "TT", "AA"), 
Storn.s2 = c("TT", "AA", "--", "CA", "GG", "TT", "GG"), Storn.s3 = c("TT", 
"AA", "GG", "--", "GG", "TT", "GG"), Storn.s4 = c("TT", "AA", 
"GT", "CA", "GG", "TT", "GG"), Tre = c("CC", "--", "GT", 
"AA", "TT", "CC", "AG")), .Names = c("mi", "pos", "L08.92.s1", 
"L08.92.s2", "LD09.911.s1", "LD09.911.s2", "Storn.s1", "Storn.s2", 
"Storn.s3", "Storn.s4", "Tre"), class = "data.frame", row.names = c(NA, 
-7L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM