简体   繁体   English

ggplot2、geom_bar 和 position="dodge" 的问题:stacked 具有正确的 y 值,dodged 没有

[英]Issue with ggplot2, geom_bar, and position="dodge": stacked has correct y values, dodged does not

I'm having quite the time understanding geom_bar() and position="dodge" .我有很多时间理解geom_bar()position="dodge" I was trying to make some bar graphs illustrating two groups.我试图制作一些条形图来说明两组。 Originally the data was from two separate data frames.最初,数据来自两个单独的数据帧。 Per this question , I put my data in long format.根据这个问题,我将我的数据放在长格式中。 My example:我的例子:

test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)

df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1, 
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))

I use that example as it's similar to the spend vs. budget example.我使用该示例是因为它类似于支出与预算示例。 Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).每个names因素级别的支出有很多行,而预算只有一个(每个类别一个预算金额)。

For a stacked bar plot, this works great:对于堆积条形图,这很好用:

ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")

堆积图

In particular, note the y value maxes.特别要注意 y 值的最大值。 They are the sums of the data from test with the values of test2 shown on blue on top.它们是来自test的数据的总和,其中test2的值显示在顶部的蓝色处。

Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:根据我读过的其他问题,我只需要添加position="dodge"以使其成为并排图与堆叠图:

ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
geom_bar(stat="identity", position="dodge")

躲过了

It looks great, but note the new max y values.它看起来不错,但请注意新的最大值 y 值。 It seems like it's just taking the max y value from each names factor level from test for the y value.似乎它只是从 y 值的test中获取每个名称因子级别的最大 y 值。 It's no longer summing them.它不再对它们求和。

Per some other questions (like this one and this one , I also tried adding the group= option without success (produces the same dodged plot as above):对于其他一些问题(比如这个这个,我也尝试添加group=选项但没有成功(产生与上面相同的躲避图):

ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")

I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.我不明白为什么堆叠的效果很好,而躲避的不只是将它们并排而不是放在上面。


ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. ETA:最近在 ggplot google 组上发现了一个关于此的问题,建议添加alpha=0.5以查看发生了什么。 It isn't that ggplot is taking the max value from each grouping;并不是 ggplot 从每个分组中获取最大值; it's actually over-plotting bars on top of one another for each value.它实际上是在每个值的顶部绘制条形图。

It seems that when using position="dodge" , ggplot expects only one y per x.似乎在使用position="dodge"时,ggplot 期望每个 x 只有一个 y。 I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.我联系了 ggplot 开发人员 Winston Chang 以确认并询问是否可以更改,因为我没有看到优势。

It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.似乎stat="identity"应该告诉 ggplot 计算在aes()中传递的y=val而不是在没有stat="identity"并且没有传递 y 值时发生的单个计数。

For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:目前,解决方法似乎是(对于上面的原始 df)进行聚合,因此每个 x 只有一个 y:

df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p

正确的

I think the problem is that you want to stack within values of the num group, and dodge between values of num .我认为问题在于您希望在num组的值堆叠,并num之间躲避。 It might help to look at what happens when you add an outline to the bars.查看向条形添加轮廓时会发生什么可能会有所帮助。

library(ggplot2)
set.seed(123)
df <- data.frame(
  id     = 1:18,
  names  = rep(LETTERS[1:3], 6),
  num    = c(rep(1, 15), rep(2, 3)),
  values = sample(1:10, 18, replace=TRUE)
)

By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:默认情况下,有很多条形堆叠 - 除非您有轮廓,否则您不会看到它们是分开的:

# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black")

堆叠的酒吧

If you dodge, you get bars that are dodged between values of num , but there may be multiple bars within each value of num :如果你躲避,你会得到在num值之间躲避的条形,但在num每个值内可能有多个条形:

# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)

躲在 num

If you also add id as a grouping var, it'll dodge all of them:如果您还将id添加为分组变量,它将避开所有这些:

# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) + 
  geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)

躲避所有酒吧

I think what you want is to both dodge and stack, but you can't do both.我认为你想要的是躲避和堆叠,但你不能两者都做。 So the best thing is to summarize the data yourself.所以最好的办法是自己汇总数据。

library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))

ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black", position="dodge")

事先总结

  library(ggplot2)
  # bar dodge
  ggplot(iris,aes(x=Species,y=Petal.Length))+
    geom_bar(stat="identity",position="dodge",col="red")

  # bar dodge 2
  ggplot(iris,aes(x=Species,y=Petal.Length))+
    geom_bar(stat="identity",position="dodge2",col="red")

  # col dodge 2
  ggplot(iris,aes(x=Species,y=Petal.Length))+
    geom_col(position="dodge2",col="red")

Created on 2022-01-22 by the reprex package (v2.0.1)reprex 包于 2022-01-22 创建 (v2.0.1)

Session info 会话信息
sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os Ubuntu 20.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_GB:en #> collate en_GB.UTF-8 #> ctype en_GB.UTF-8 #> tz Europe/Stockholm #> date 2022-01-22 #> pandoc 2.14.0.3 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.0) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.0) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.0) #> farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0) #> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) #> ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) #> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.0) #> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> mime 0.12 2021-09-28 [1] CRAN (R 4.1.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.0) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.0) #> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.0) #> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /home/roy/miniconda3/envs/r-4.1/lib/R/library #> #> ──────────────────────────────────────────────────────────────────────────────

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM