简体   繁体   English

如何更正我的代码以在 R 中使用 ggplot2 运行堆积条形图?

[英]How do I correct my code to run a stacked bar chart using ggplot2 in R?

My data includes 6 samples ( as rownames currently) and 24 columns each of which is named after different bacterial species, and the numbers are the relative abundances.我的数据包括 6 个样本(当前作为行名)和 24 列,每列都以不同的细菌种类命名,数字是相对丰度。

Here is the structure;这是结构;

dput(sig_speciesstacked) 

structure(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3", "0.32503", "0.55197", "1.23225", "0", "0", "0", "0.11568", "1.27372", "0.04306", "0", "0", "0", "0.78402", "0.99583", "0.03723", "0", "0", "0", "0.07664", "0.0932", "0.28018", "0", "0", "0", "0.29037", "0.74246", "0.3061", "0", "0", "0", "0.22328", "0.40351", "0.00416", "0", "0", "0", "0", "0", "0", "0.23779", "0.70807", "0.00891", "0.04852", "0.34497", "0.19266", "0", "0", "0", "0.26408", "0.05026", "0.0022", "0", "0", "0", "0.31206", "0.59428", "0.15606", "0", "0", "0", "0.13716", "0.55023", "0.4716", "0", "0", "0", "0.27194", "0.57013", "0.23164", "0", "0", "0", "6.84233", "2.18166", "0.6827", "0", "0", "0", "0", "0", "0", "0.94569", "0.0108", "0.06016", "0.32686", "0.04407", "1.02125", "0", "0", "0", "0", "0", "0", "0.51243", "0.10427", "1.48269", "0", "0", "0", "1.49594", "0.90364", "0.0081", "1.27002", "1.80154", "0.33065", "0", "0", "0", "2.40484", "0.36535", "3.79276", "0", "0", "0", "4.23202", 结构(c(“控制1”,“控制2”,“控制3”,“疾病1”,“疾病2”,“疾病3”,“0.32503”,“0.55197”,“1.23225”,“0”,“0”,“0 ", "0.11568", "1.27372", "0.04306", "0", "0", "0", "0.78402", "0.99583", "0.03723", "0", "0", "0", “0.07664”、“0.0932”、“0.28018”、“0”、“0”、“0”、“0.29037”、“0.74246”、“0.3061”、“0”、“0”、“0”、“0.22328” ", "0.40351", "0.00416", "0", "0", "0", "0", "0", "0", "0.23779", "0.70807", "0.00891", "0.04852", “0.34497”、“0.19266”、“0”、“0”、“0”、“0.26408”、“0.05026”、“0.0022”、“0”、“0”、“0”、“0.31206”、“0.59428” ", "0.15606", "0", "0", "0", "0.13716", "0.55023", "0.4716", "0", "0", "0", "0.27194", "0.57013", “0.23164”、“0”、“0”、“0”、“6.84233”、“2.18166”、“0.6827”、“0”、“0”、“0”、“0”、“0”、“0” ", "0.94569", "0.0108", "0.06016", "0.32686", "0.04407", "1.02125", "0", "0", "0", "0", "0", "0", “0.51243”、“0.10427”、“1.48269”、“0”、“0”、“0”、“1.49594”、“0.90364”、“0.0081”、“1.27002”、“1.80154”、“0.3306” ", "0", "0", "2.40484", "0.36535", "3.79276", "0", "0", "0", "4.23202", "2.63742", "0.37963", "0", "0", "0", "0.38793", "0.81874", "0.04095", "0", "0", "0", "0", "0", "0", "1.04847", "0.08983", "0.02608", "0", "0", "0", "0.14408", "0.1637", "0.07754"), .Dim = c(6L, 24L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("Sample", "Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica"))) “2.63742”、“0.37963”、“0”、“0”、“0”、“0.38793”、“0.81874”、“0.04095”、“0”、“0”、“0”、“0”、“0” ", "0", "1.04847", "0.08983", "0.02608", "0", "0", "0", "0.14408", "0.1637", "0.07754"), .Dim = c(6L, 24L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("Sample", "Alistipes_finegoldii", "Alistipes_indistinctus", " Alistipes_onderdonkii”, “Alistipes_senegalensis”, “Bacteroidales_bacterium_ph8”, “Bifidobacterium_adolescentis”, “Bifidobacterium_dentium”, “Collinsella_aerofaciens”, “Coprobacter_fastidiosus”, “Coprococcus_comes”, “Dorea_longicatena”, “Eubacterium_hallii”, “Eubacterium_rectale”, “Fusobacterium_varium”, “Lachnospiraceae_bacterium_3_1_46FAA” , "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella)_atypica"

I am trying to make a stacked bar chart showing the different abundances for the different samples ( 3 control and 3 diseases).我正在尝试制作一个堆叠条形图,显示不同样本(3 个对照和 3 个疾病)的不同丰度。

First I added a column name to the one containing my sample names, so there were then 25 columns in total.首先,我向包含我的样本名称的列添加了一个列名称,因此总共有 25 列。 1st one contains samples, 2:25 contain the abundances of the 24 different species.第一个包含样本,2:25 包含 24 个不同物种的丰度。

 sig_speciesstacked <- cbind(Samples= rownames(sig_speciesstacked), sig_speciesstacked)

 print(colnames(sig_speciesstacked))

 rownames(sig_speciesstacked) <- c("1", "2", "3", "4", "5", "6").  

I already have installed and loaded reshape2.我已经安装并加载了 reshape2。 The code I run then is我当时运行的代码是

 sig_speciesstackplot1 <- melt(sig_speciesstacked, id.vars = "Samples", variable.name = "species")

 pdf("Stackedbarplot.species.pdf", width = 6, height = 7)
 ggplot(sig_speciesstackplot1,aes(x=Samples, y=value, fill= species))+ geom_bar(stat = 
 "identity", position="fill")

The error I am met with is Error in FUN(X[[i]], ...) : object 'Samples' not found, then it will be abundance not found, then species not found.我遇到的错误是 Error in FUN(X[[i]], ...) : object 'Samples' not found,然后它将是丰度未找到,然后物种未找到。

Edit;编辑; I understand I have to rename aes(x=, y=) to the col names of sig_speciesstackplot1.我知道我必须将 aes(x=, y=) 重命名为 sig_speciesstackplot1 的列名称。 However, this is not the correct format of the sig_speciesstackplot1 output following melt?但是,这不是熔化后 sig_speciesstackplot1 输出的正确格式?

       Var1    Var2                 value
     1  1   Samples               Control1
     2  2   Samples               Control2
     3  3   Samples               Control3
     4  4   Samples               Disease1
     5  5   Samples               Disease2
     6  6   Samples               Disease3
     7  1   Alistipes_finegoldii    0.32503
     8  2   Alistipes_finegoldii    0.55197
     9  3   Alistipes_finegoldii    1.23225
     10 4   Alistipes_finegoldii    0

And so on, each of the 24 species is repeated 6 times with different abundance levels corresponding to the different samples.以此类推,24个物种中的每一个都重复了6次,不同的样本对应不同的丰度。

Not sure why Var1 and Var2 were not renamed to "Samples" and "species" respectively from my line of code above, and why the output is like that.不知道为什么 Var1 和 Var2 没有从我上面的代码行中分别重命名为“样本”和“物种”,以及为什么输出是这样的。

And running the ggplot using aes(x = Var1 etc) gets a plot that is completely wrong.使用 aes(x = Var1 etc) 运行 ggplot 会得到一个完全错误的图。

Edit;编辑; For anybody having a similar issue, please do not use cbind.对于任何有类似问题的人,请不要使用 cbind。 From an example on here, they made column 1 contain the sample names, hence why I used it.从这里的示例中,他们使第 1 列包含样本名称,这就是我使用它的原因。 If you don't do this and just have the row names as the sample names, it will work fine.如果您不这样做并且仅将行名称作为示例名称,则它会正常工作。 Thank you very much to those who helped below!非常感谢楼下帮助过的人!

nice that you are posting here for the first time.很高兴您第一次在这里发帖。 I don't know if I understand your question correctly but here is my attempt in solving it.我不知道我是否正确理解你的问题,但这是我解决它的尝试。

Please note that my approach uses 'pipes' ( %>% ) and also the function pivot_longer from the package tidyverse instead of melt .请注意,我的方法是使用“管”( %>%也是功能pivot_longer从包装tidyverse而不是melt

# load needed packge (includes ggplot2), install first if not installed yet
library("tidyverse")

# putting your data into an object
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))

df_plot <- sig_speciesstacked %>% 
        # maling a data frame from your data
        data.frame() %>% 
        # use the matrix row names (your data) and put them into a column names 'type'
        rownames_to_column(var = "type") %>% 
        # pivot longer instead of melt
        pivot_longer(-type, names_to = "names", values_to = "value")

ggplot(data = df_plot,
       aes(x = names, y = value, group = type, fill = type)) + 
        geom_bar(stat = "identity", position="stack")

Created on 2019-11-25 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2019 年 11 月 25 日创建


Update更新

After clarification and looking at your code again, the solution seems simpler.澄清并再次查看您的代码后,解决方案似乎更简单。 You were on a good track, only that you didn't use the correct names from you 'melted' data for the plot, as @camille pointed out.正如@camille 所指出的那样,您的进展顺利,只是您没有为情节使用“融化”数据中的正确名称。

The aesthetics ( aes ) in ggplot need to refer to the column names in your data ( sig_speciesstackplot1 ). ggplot的美学 ( aes ) 需要引用数据中的列名 ( sig_speciesstackplot1 )。 As you saw yourself, these are Var1 , Var2 , and value .如您所见,这些是Var1Var2value

library("tidyverse")
library(reshape2)
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths

# Your code
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))

sig_speciesstackplot1 <- melt(sig_speciesstacked, id.vars = "Samples", variable.name = "species")

# Correct plot
ggplot(sig_speciesstackplot1,
       aes(x=Var1, y=value, fill= Var2))+ 
        geom_bar(stat = "identity", position="stack") +
        theme(legend.position="bottom")

Created on 2019-11-25 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2019 年 11 月 25 日创建


Update 2更新 2

If you want them 'stacked' as a percentage, you can use position = "fill" like so:如果您希望它们按百分比“堆叠”,您可以使用position = "fill"如下所示:

ggplot(sig_speciesstackplot1,
       aes(x=Var1, y=value, fill= Var2))+ 
        geom_bar(stat = "identity", position="fill") +
        theme(legend.position="bottom")

Created on 2019-11-25 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2019 年 11 月 25 日创建


Update 3更新 3

After re-examining the OPs code and the comments below I want to share the following.在重新检查 OPs 代码和下面的评论后,我想分享以下内容。

The OP used reshape2::melt() on a matrix with rownames .所用的OP reshape2::melt()上的matrixrownames This issue is discussed here: Why reshape2's Melt cannot capture rownames in the transformation?此处讨论了此问题: 为什么 reshape2 的 Melt 无法在转换中捕获行名?

Below, I compare the behaviour of reshape2::melt() for a matrix and a data.frame .下面,我比较了reshape2::melt()对于matrixdata.frame The latter one shows the intended behaviour.后一个显示了预期的行为。

# OPs code
sig_speciesstacked <- structure(c(0.32503, 0.55197, 1.23225, 0, 0, 0, 0.11568, 1.27372, 0.04306, 0, 0, 0, 0.78402, 0.99583, 0.03723, 0, 0, 0, 0.07664, 0.0932, 0.28018, 0, 0, 0, 0.29037, 0.74246, 0.3061, 0, 0, 0, 0.22328, 0.40351, 0.00416, 0, 0, 0, 0, 0, 0, 0.23779, 0.70807, 0.00891, 0.04852, 0.34497, 0.19266, 0, 0, 0, 0.26408, 0.05026, 0.0022, 0, 0, 0, 0.31206, 0.59428, 0.15606, 0, 0, 0, 0.13716, 0.55023, 0.4716, 0, 0, 0, 0.27194, 0.57013, 0.23164, 0, 0, 0, 6.84233, 2.18166, 0.6827, 0, 0, 0, 0, 0, 0, 0.94569, 0.0108, 0.06016, 0.32686, 0.04407, 1.02125, 0, 0, 0, 0, 0, 0, 0.51243, 0.10427, 1.48269, 0, 0, 0, 1.49594, 0.90364, 0.0081, 1.27002, 1.80154, 0.33065, 0, 0, 0, 2.40484, 0.36535, 3.79276, 0, 0, 0, 4.23202, 2.63742, 0.37963, 0, 0, 0, 0.38793, 0.81874, 0.04095, 0, 0, 0, 0, 0, 0, 1.04847, 0.08983, 0.02608, 0, 0, 0, 0.14408, 0.1637, 0.07754), .Dim = c(6L, 23L), .Dimnames = list(c("Control1", "Control2", "Control3", "Disease1", "Disease2", "Disease3" ), c("Alistipes_finegoldii", "Alistipes_indistinctus", "Alistipes_onderdonkii", "Alistipes_senegalensis", "Bacteroidales_bacterium_ph8", "Bifidobacterium_adolescentis", "Bifidobacterium_dentium", "Collinsella_aerofaciens", "Coprobacter_fastidiosus", "Coprococcus_comes", "Dorea_longicatena", "Eubacterium_hallii", "Eubacterium_rectale", "Fusobacterium_varium", "Lachnospiraceae_bacterium_3_1_46FAA", "Lactobacillus_mucosae", "Megasphaera_micronuciformis", "Odoribacter_splanchnicus", "Roseburia_hominis", "Ruminococcus_bromii", "Ruminococcus_callidus", "Streptococcus_parasanguinis", "Veillonella_atypica")))
sig_speciesstacked <- cbind(Samples= rownames(sig_speciesstacked), sig_speciesstacked)
rownames(sig_speciesstacked) <- c("1", "2", "3", "4", "5", "6")

# Using reshapse2::melt on a matrix
sig_speciesstackplot1 <- reshape2::melt(sig_speciesstacked,
                              id.vars = "Samples", variable.name = "species")
head(sig_speciesstackplot1)
#>   Var1    Var2    value
#> 1    1 Samples Control1
#> 2    2 Samples Control2
#> 3    3 Samples Control3
#> 4    4 Samples Disease1
#> 5    5 Samples Disease2
#> 6    6 Samples Disease3

# Using reshapse2::melt on a data.frame with stringsAsFactors = F
sig_speciesstackplot1 <- reshape2::melt(as.data.frame(sig_speciesstacked,
                                                      stringsAsFactors = F),
                              id.vars = "Samples", variable.name = "species")
head(sig_speciesstackplot1)
#>    Samples              species   value
#> 1 Control1 Alistipes_finegoldii 0.32503
#> 2 Control2 Alistipes_finegoldii 0.55197
#> 3 Control3 Alistipes_finegoldii 1.23225
#> 4 Disease1 Alistipes_finegoldii       0
#> 5 Disease2 Alistipes_finegoldii       0
#> 6 Disease3 Alistipes_finegoldii       0

Created on 2019-11-26 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2019 年 11 月 26 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM