[英]Cumulative sum of rows in R from one column name until another column name, then divide cells by another value
[英]Cast using one variable as column name and another as a value source in R
我有此数据集,我想以ID.name
为行的方式重铸。 Canonical_Hugo_Symbol
是列名, Canonical_Protein_Change
是单元格的值。 如果没有NA
而其他单元格只有0,那就太好了。
mydata.df <- data.frame(ID.name = c("1000", "1000", "1000", "1001","1001","1001","1002","1002" ), Canonical_Protein_Change = c("p.Y1467H", "p.R1466W", "p.*427Q", "p.V320fs","p.S5383fs","p.D519V","p.S51A", "p.K183_splice" ), Canonical_Hugo_Symbol = c("gene1", "gene3", "gene1", "gene1","gene3","gene4","gene1", "gene2" ))
我已经融化了:
ff.melt <- melt(mydata.df, id.var = c("ID.name", "Canonical_Hugo_Symbol"))
ff.melt
ID.name Canonical_Hugo_Symbol variable value
1 1000 gene1 Canonical_Protein_Change p.Y1467H
2 1000 gene3 Canonical_Protein_Change p.R1466W
3 1000 gene1 Canonical_Protein_Change p.*427Q
4 1001 gene1 Canonical_Protein_Change p.V320fs
5 1001 gene3 Canonical_Protein_Change p.S5383fs
6 1001 gene4 Canonical_Protein_Change p.D519V
7 1002 gene1 Canonical_Protein_Change p.S51A
8 1002 gene2 Canonical_Protein_Change p.K183_splice
然后我重铸了它:
ff.cast <- dcast(ff.melt, ID.name ~ Canonical_Hugo_Symbol + value)
我得到这个df
:
ff.cast
ID.name gene1_p.*427Q gene1_p.S51A gene1_p.V320fs gene1_p.Y1467H gene2_p.K183_splice gene3_p.R1466W gene3_p.S5383fs
1 1000 p.*427Q <NA> <NA> p.Y1467H <NA> p.R1466W <NA>
2 1001 <NA> <NA> p.V320fs <NA> <NA> <NA> p.S5383fs
3 1002 <NA> p.S51A <NA> <NA> p.K183_splice <NA> <NA>
gene4_p.D519V
1 <NA>
2 p.D519V
3 <NA>
它接近我想要的,但是现在对于每个“基因”,都有许多名称不同的列。 例如,我希望将gene1_p.*427Q
, gene1_p.S51A
, gene1_p.V320fs
, gene1_p.Y1467H
都放在一栏中。
我还用过:
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value_var = "Canonical_Protein_Change" )
但我收到此错误消息:
Error in .fun(.value[0], ...) : 2 arguments passed to 'length' which requires 1 >
谢谢
我想要这张桌子或类似的东西! 谢谢!
ID.name gene1 gene2 gene3 gene4
1 1000 Cp.*427Q 0 p.R1466W 0
2 1001 p.V320fs 0 p.S5383fs p.D519V
3 1002 p.S51A p.K183 0 0
当我尝试时,我越来越近,但名字错误:
reshape(mydata.df, direction = 'wide', idvar = 'ID.name', timevar = 'Canonical_Hugo_Symbol')
我已经修正了名字:
colnames(mydata.reshape) <- sub("Canonical_Protein_Change.(.*?)","\\1", colnames(mydata.reshape))
但是NA还在那里
您可以尝试以下方法:
# concatenate values in cells with more than one value
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
fun.aggregate = function(x) paste(x, collapse = "; "), fill = "0")
# ID.name gene1 gene2 gene3 gene4
# 1 1000 p.Y1467H; p.*427Q 0 p.R1466W 0
# 2 1001 p.V320fs 0 p.S5383fs p.D519V
# 3 1002 p.S51A p.K183_splice 0 0
# ...or pick the first value in cells with more than one value
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
fun.aggregate = head, 1, fill = "0")
# ID.name gene1 gene2 gene3 gene4
# 1 1000 p.Y1467H 0 p.R1466W 0
# 2 1001 p.V320fs 0 p.S5383fs p.D519V
# 3 1002 p.S51A p.K183_splice 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.