[英]How to merge rows of binary matrix based on substring rowname matches?
If the rownames of the binary matrix match before the 4th .
如果二进制矩阵的行名在第 4 个之前匹配.
delimiter, merge the two rows, where if either row as 1
, the column value will be 1
.分隔符,合并两行,其中如果任何一行为1
,则列值将为1
。 Also, remove everything after the 4th .
另外,删除 4th 之后的所有内容.
delimiter in the rownames.行名中的分隔符。
Sample Data:样本数据:
structure(list(DNMT3A = c(1, 0, 0, 0, 0), IGF2R = c(1, 0, 0, 0, 1),
NBEA = c(1, 0, 0, 0, 1), ITGB5 = c(0, 1, 0, 0, 0)), row.names = c("TCGA.2Z.A9J1.01A.11D.A382.10",
"TCGA.B9.A5W9.01A.11D.A28G.10", "TCGA.2Z.A9JM.01A.13D.A44J.12", "TCGA.GL.A59R.01A.11D.A26P.10",
"TCGA.2Z.A9JM.01A.12D.A42J.10"), class = "data.frame")
Desired output:所需的 output:
structure(list(DNMT3A = c(1, 0, 0, 0), IGF2R = c(1, 0, 1, 0),
NBEA = c(1, 0, 1, 0), ITGB5 = c(0, 1, 0, 0)), row.names = c("TCGA.2Z.A9J1.01A",
"TCGA.B9.A5W9.01A", "TCGA.2Z.A9JM.01A", "TCGA.GL.A59R.01A"), class = "data.frame")
Try this:尝试这个:
split(dat1, substring(rownames(dat1), 1, 16)) |>
lapply(function(z) if (nrow(z) == 1) z else t(apply(z, 2, function(z) +any(z > 0)))) |>
do.call(rbind, args = _)
# DNMT3A IGF2R NBEA ITGB5
# TCGA.2Z.A9J1.01A 1 1 1 0
# TCGA.2Z.A9JM.01A 0 1 1 0
# TCGA.B9.A5W9.01A 0 0 0 1
# TCGA.GL.A59R.01A 0 0 0 0
Note that the use of args=_
with |>
requires R-4.2.0.请注意,将args=_
与|>
一起使用需要 R-4.2.0。 Without that, one can use any of the following for the last line in the code block:如果没有它,可以在代码块的最后一行使用以下任何一种:
... %>% do.call(rbind, .)
... |> (function(z) do.call(rbind, z))()
I'm naively assuming that all rownames have exactly the same number of characters in each .
我天真地假设所有行名在每个.
-delimited substring; -分隔 substring; you may need to adapt the substring(...)
if that assumption is not true.如果该假设不正确,您可能需要调整substring(...)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.