[英]how to melt down columns of different replications, character values taken by conditions
I have a data.frame
that looks like: 我有一个
data.frame
看起来像:
mi pos L08.92.s1 L08.92.s2 LD09.911.s1 LD09.911.s2 Storn.s1 Storn.s2 Storn.s3 Storn.s4 Tre
1 snp1 12713760 CC CT CC CC TT TT TT TT CC
2 snp2 8219379 AA AA -- AA AA AA AA AA --
3 snp3 6595215 GG GG GG GG GG -- GG GT GT
4 snp4 42348146 CC CC CC CC CC CA -- CA AA
5 snp5 1809563 GG GG TT TG GG GG GG GG TT
6 snp6 34285723 TT CC -- -- TT TT TT TT CC
7 snp7 21533194 AA AA AG -- AA GG GG GG AG
I expect the final data frame to look like: 我希望最终数据框看起来像:
mi pos L08.92 LD09.911 Storn Tre
1 snp1 12713760 CC CC TT CC
2 snp2 8219379 AA AA AA --
3 snp3 6595215 GG GG GG GT
4 snp4 42348146 CC CC CC AA
5 snp5 1809563 GG TT GG TT
6 snp6 34285723 HH -- TT CC
7 snp7 21533194 AA AG HH AG
Procedures: columns with replications would be melt down to one column for each sample. 程序:对于每个样品,具有重复性的色谱柱将融化为一根色谱柱。 The value will be taken from replications as the following rules:
该值将按照以下规则从复制中获取:
Thank you for help! 谢谢你的帮助!
You may try 你可以试试
indx <- gsub("[.][^.]+$", "", colnames(df)[-(1:2)])
lst <- split(colnames(df)[-(1:2)], indx)
Un <- c('AA', 'CC', 'GG', 'TT')
df2 <- df[,1:2]
df2[unique(indx)] <- lapply(lst, function(x)
apply(df[x], 1, function(y) {y1 <- unique(y)
y2 <- y1[y1 %in% Un]
ifelse(length(y2)==0, sort(y1, decreasing=TRUE),
ifelse(length(y2)==2, 'HH', y2))
}))
df2
# mi pos L08.92 LD09.911 Storn Tre
#1 snp1 12713760 CC CC TT CC
#2 snp2 8219379 AA AA AA --
#3 snp3 6595215 GG GG GG GT
#4 snp4 42348146 CC CC CC AA
#5 snp5 1809563 GG TT GG TT
#6 snp6 34285723 HH -- TT CC
#7 snp7 21533194 AA AG HH AG
df <- structure(list(mi = c("snp1", "snp2", "snp3", "snp4", "snp5",
"snp6", "snp7"), pos = c(12713760L, 8219379L, 6595215L, 42348146L,
1809563L, 34285723L, 21533194L), L08.92.s1 = c("CC", "AA", "GG",
"CC", "GG", "TT", "AA"), L08.92.s2 = c("CT", "AA", "GG", "CC",
"GG", "CC", "AA"), LD09.911.s1 = c("CC", "--", "GG", "CC", "TT",
"--", "AG"), LD09.911.s2 = c("CC", "AA", "GG", "CC", "TG", "--",
"--"), Storn.s1 = c("TT", "AA", "GG", "CC", "GG", "TT", "AA"),
Storn.s2 = c("TT", "AA", "--", "CA", "GG", "TT", "GG"), Storn.s3 = c("TT",
"AA", "GG", "--", "GG", "TT", "GG"), Storn.s4 = c("TT", "AA",
"GT", "CA", "GG", "TT", "GG"), Tre = c("CC", "--", "GT",
"AA", "TT", "CC", "AG")), .Names = c("mi", "pos", "L08.92.s1",
"L08.92.s2", "LD09.911.s1", "LD09.911.s2", "Storn.s1", "Storn.s2",
"Storn.s3", "Storn.s4", "Tre"), class = "data.frame", row.names = c(NA,
-7L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.