繁体   English   中英

在R中使用一个变量作为列名并使用另一个变量作为值源进行转换

[英]Cast using one variable as column name and another as a value source in R

我有此数据集,我想以ID.name为行的方式重铸。 Canonical_Hugo_Symbol是列名, Canonical_Protein_Change是单元格的值。 如果没有NA而其他单元格只有0,那就太好了。

mydata.df <- data.frame(ID.name = c("1000", "1000", "1000", "1001","1001","1001","1002","1002" ), Canonical_Protein_Change = c("p.Y1467H", "p.R1466W", "p.*427Q", "p.V320fs","p.S5383fs","p.D519V","p.S51A", "p.K183_splice" ), Canonical_Hugo_Symbol = c("gene1", "gene3", "gene1", "gene1","gene3","gene4","gene1", "gene2" ))

我已经融化了:

ff.melt <- melt(mydata.df, id.var = c("ID.name", "Canonical_Hugo_Symbol"))

ff.melt
 ID.name Canonical_Hugo_Symbol                 variable         value
1    1000                 gene1 Canonical_Protein_Change      p.Y1467H
2    1000                 gene3 Canonical_Protein_Change      p.R1466W
3    1000                 gene1 Canonical_Protein_Change       p.*427Q
4    1001                 gene1 Canonical_Protein_Change      p.V320fs
5    1001                 gene3 Canonical_Protein_Change     p.S5383fs
6    1001                 gene4 Canonical_Protein_Change       p.D519V
7    1002                 gene1 Canonical_Protein_Change        p.S51A
8    1002                 gene2 Canonical_Protein_Change p.K183_splice

然后我重铸了它:

ff.cast <- dcast(ff.melt, ID.name ~ Canonical_Hugo_Symbol + value)

我得到这个df

ff.cast
  ID.name gene1_p.*427Q gene1_p.S51A gene1_p.V320fs gene1_p.Y1467H gene2_p.K183_splice gene3_p.R1466W gene3_p.S5383fs
 1    1000       p.*427Q         <NA>           <NA>       p.Y1467H                <NA>       p.R1466W            <NA>
 2    1001          <NA>         <NA>       p.V320fs           <NA>                <NA>           <NA>       p.S5383fs
3    1002          <NA>       p.S51A           <NA>           <NA>       p.K183_splice           <NA>             <NA>
  gene4_p.D519V
1          <NA>
2       p.D519V
3          <NA>

它接近我想要的,但是现在对于每个“基因”,都有许多名称不同的列。 例如,我希望将gene1_p.*427Qgene1_p.S51Agene1_p.V320fsgene1_p.Y1467H都放在一栏中。

我还用过:

dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value_var = "Canonical_Protein_Change" )

但我收到此错误消息:

Error in .fun(.value[0], ...) : 2 arguments passed to 'length' which requires 1 > 

谢谢

我想要这张桌子或类似的东西! 谢谢!

  ID.name   gene1    gene2      gene3      gene4
1    1000  Cp.*427Q    0      p.R1466W       0
2    1001  p.V320fs    0      p.S5383fs   p.D519V
3    1002  p.S51A   p.K183        0          0

当我尝试时,我越来越近,但名字错误:

  reshape(mydata.df, direction = 'wide', idvar = 'ID.name', timevar = 'Canonical_Hugo_Symbol')

我已经修正了名字:

colnames(mydata.reshape) <- sub("Canonical_Protein_Change.(.*?)","\\1",  colnames(mydata.reshape))

但是NA还在那里

您可以尝试以下方法:

# concatenate values in cells with more than one value  
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
      fun.aggregate = function(x) paste(x, collapse = "; "), fill = "0")

#   ID.name             gene1         gene2     gene3   gene4
# 1    1000 p.Y1467H; p.*427Q             0  p.R1466W       0
# 2    1001          p.V320fs             0 p.S5383fs p.D519V
# 3    1002            p.S51A p.K183_splice         0       0

# ...or pick the first value in cells with more than one value
dcast(mydata.df, ID.name ~ Canonical_Hugo_Symbol, value.var = "Canonical_Protein_Change",
      fun.aggregate = head, 1, fill = "0")
#   ID.name    gene1         gene2     gene3   gene4
# 1    1000 p.Y1467H             0  p.R1466W       0
# 2    1001 p.V320fs             0 p.S5383fs p.D519V
# 3    1002   p.S51A p.K183_splice         0       0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM