简体   繁体   English

geom_bar ggplot2堆积,分组条形图,正负值 - 金字塔图

[英]geom_bar ggplot2 stacked, grouped bar plot with positive and negative values - pyramid plot

I don't even know how to describe the plot I am trying to generate properly, which is not a great start. 我甚至不知道如何描述我想要正确生成的情节,这不是一个好的开始。 I will first show you my data, and then try to explain / show images that have elements of it. 我将首先向您展示我的数据,然后尝试解释/显示具有该元素的图像。

My data: 我的数据:

   strain condition count.up count.down
1    phbA  balanced      120       -102
2    phbA   limited      114       -319
3    phbB  balanced      122       -148
4    phbB   limited       97       -201
5   phbAB  balanced      268       -243
6   phbAB   limited      140       -189
7    phbC  balanced       55        -65
8    phbC   limited      104       -187
9    phaZ  balanced       99        -28
10   phaZ   limited      147       -205
11   bdhA  balanced      246       -159
12   bdhA   limited      143       -383
13  acsA2  balanced      491       -389
14  acsA2   limited      131       -295

I have seven samples, each in two conditions. 我有七个样本,每个样本有两个条件。 For each of these samples, I have the number of genes that are down regulated, and the number of genes that are upregulated (count.down and count.up). 对于这些样本中的每一个,我都有下调基因的数量,以及被上调的基因数量(count.down和count.up)。

I want to plot this so that each sample is grouped; 我想绘制这个,以便每个样本分组; so phbA balanced is dodged beside phbA limited. 因此除了phbA限制之外,phbA平衡被躲过了。 Each bar would have a portion (representing the count.up #) in the positive side of the plot, and a portion (representing the count.down #) in the negative side of the plot. 每个条形图将在图的正面具有一部分(表示count.up#),并且在图的负面部分具有一部分(表示count.down#)。

I want the bars from the 'balanced' condition to be one colour, and the bars from the 'limited' condition to be another. 我希望“平衡”条件下的条形为一种颜色,而“有限”条件下的条形变为另一种颜色。 Ideally, there would be two gradients of each colour (one for count.up and one for count.down), just to make a visual difference between the two parts of the bar. 理想情况下,每种颜色会有两个渐变(一个用于count.up,​​另一个用于count.down),只是为了在条形的两个部分之间产生视觉差异。

Some images that have elements that I am trying to pull together: 一些图像中包含我想要组合在一起的元素:

I've also tried to apply some of the pieces of this stackoverflow example, but I can't figure out how to make it work for my data set. 我也尝试应用这个stackoverflow示例的一些部分,但我无法弄清楚如何使它适用于我的数据集。 I like the pos v. neg bars here; 我喜欢这里的pos v.neg酒吧; a single bar that covers both, and the colour differentiation of it. 一个覆盖它们的单个条形图,以及它的颜色区别。 This does not have the grouping of conditions for one sample, or the colour coding extra layer that differentiates condition 这没有一个样本的条件分组,或者区分条件的颜色编码额外层

I have tried a bunch of things, and I just can't get it right. 我尝试了很多东西,但我无法做到。 I think I am really struggling because a lot of geom_bar examples use count data, that the plot calculates itself, where as I am giving it direct count data. 我认为我真的很挣扎,因为很多geom_bar示例使用计数数据,该图计算自己,我给它直接计数数据。 I don't seem to be able to successful make that differentiation in my code, when I transfer to stat= "identity" then everything gets messy. 我似乎无法在我的代码中成功区分,当我转移到stat= "identity"一切都变得混乱。 Any thoughts or suggestions would be very greatly appreciated! 任何想法或建议将非常感谢!

Using the link suggested: So I've been playing around with that as a template, but I've gotten stuck. 使用建议的链接:所以我一直在玩这个作为模板,但我已经卡住了。

df <- read.csv("countdata.csv", header=T) 
df.m <- melt(df, id.vars = c("strain", "condition")) 
ggplot(df.m, aes(condition)) + geom_bar(subset = ,(variable == "count.up"),    aes(y = value, fill = strain), stat = "identity") + geom_bar(subset = ,(variable == "count.down"), aes(y = -value, fill = strain), stat = "identity") + xlab("") + scale_y_continuous("Export - Import",formatter = "comma") 

when I try to run the ggplot line, it returned an error: could not find function ".". 当我尝试运行ggplot行时,它返回了一个错误:找不到函数“。”。 I realized that I did not have dplyr installed/loaded, so I did that. 我意识到我没有安装/加载dplyr,所以我做到了。 Then I played around a lot and ending up coming up with: 然后我玩了很多,最后得出结论:

library(ggplot2)
library(reshape2)
library(dplyr)
library(plyr)

df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))

#this is what the df.m looks like now (if you look at my initial input df, I    just changed in the numbers in excel to all be positive). Included so you can see what the melt does
df.m =read.table(text = "
strain condition   variable value
1    phbA  balanced   count.up   120
2    phbA   limited   count.up   114
3    phbB  balanced   count.up   122
4    phbB   limited   count.up    97
5   phbAB  balanced   count.up   268
6   phbAB   limited   count.up   140
7    phbC  balanced   count.up    55
8    phbC   limited   count.up   104
9    phaZ  balanced   count.up    99
10   phaZ   limited   count.up   147
11   bdhA  balanced   count.up   246
12   bdhA   limited   count.up   143
13  acsA2  balanced   count.up   491
14  acsA2   limited   count.up   131
15   phbA  balanced count.down   102
16   phbA   limited count.down   319
17   phbB  balanced count.down   148
18   phbB   limited count.down   201
19  phbAB  balanced count.down   243
20  phbAB   limited count.down   189
21   phbC  balanced count.down    65
22   phbC   limited count.down   187
23   phaZ  balanced count.down    28
24   phaZ   limited count.down   205
25   bdhA  balanced count.down   159 
26   bdhA   limited count.down   383
27  acsA2  balanced count.down   389
28  acsA2   limited count.down   295", header = TRUE)

this plots by strain, the count.up and count.down value under both conditions 这两个条件下的应变,count.up和count.down值

ggplot(df.m, aes(strain)) + geom_bar(subset = .(variable == "count.up"), aes(y = value, fill = condition), stat = "identity") + geom_bar(subset = .(variable == "count.down"), aes(y = -value, fill = condition), stat = "identity") + xlab("") 

#this adds a line break at zero
labels <- gsub("20([0-9]{2})M([0-9]{2})", "\\2\n\\1",
           df.m$strain)


#this adds a line break at zero to improve readability
last_plot() + geom_hline(yintercept = 0,colour = "grey90")

The one thing I have not been able to get working (unfortunately) is how to display the number representing the 'value' inside each bar box. 我不能工作的一件事(不幸的是)是如何在每个条形框内显示代表“值”的数字。 I've gotten the numbers to display, but I cannot get them in the right place. 我已经得到了要显示的数字,但我无法将它们放在正确的位置。 I'm going a little crazy! 我有点疯了!

My data is the same as above; 我的数据与上述相同; this is where my code is at 这是我的代码所在的位置

I have looked at a ton of examples showing labels using geom_text on dodged plots. 我看了很多例子,显示了在躲闪图上使用geom_text的标签。 I have been unable to implement any successfully. 我一直无法成功实施。 The closest I've gotten is as follows - any suggestions would be appreciated! 我得到的最接近的如下 - 任何建议将不胜感激!

library(ggplot2)
library(reshape2)
library(plyr)
library(dplyr)
df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(subset = .(variable == "count.up"), 
aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(subset = .(variable == "count.down"), 
aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")

last_plot() + geom_text(aes(strain, value, group=condition, label=label, ymax = 500, ymin= -500), position = position_dodge(width=0.9),size=4)

Which gives this: 这给了这个:

在此输入图像描述

Why will you not align! 你为什么不调整!

I suspect that my issue has to do with how I actually plotted, or the fact that I am not telling the geom_text command properly how to position itself. 我怀疑我的问题与我实际绘制的方式有关,或者我没有正确地告诉geom_text命令如何定位自己。 Any thoughts? 有什么想法吗?

Try this. 试试这个。 Just as you position the bars with two statements (one for positive, one for negative), position the text in the same way. 就像你用两个语句(一个用于正数,一个用于负数)定位条形时,以相同的方式定位文本。 Then, fine-tune their positioning (inside the bar, or outside the bar) using vjust . 然后,使用vjust微调它们的位置(在栏内或栏外)。 Also, there is no 'label' variable in the data frame; 此外,数据框中没有'label'变量; the label, I assume, is value . 我认为,标签是value

library(ggplot2)

## Using your df.m data frame
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(data = subset(df.m, variable == "count.up"), 
   aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(data = subset(df.m, variable == "count.down"), 
   aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")


last_plot() + 
   geom_text(data = subset(df.m, variable == "count.up"), 
      aes(strain, value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = 1.5, size=4) +
    geom_text(data = subset(df.m, variable == "count.down"), 
      aes(strain, -value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = -.5, size=4) +
    coord_cartesian(ylim = c(-500, 500))

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM