简体   繁体   English

ggplot2 中的分组条 plot

[英]Grouped bar plot in ggplot2

I am trying to make a grouped bar chart with data in long form.我正在尝试使用长格式数据制作分组条形图。

Here is the data:这是数据:

structure(list(group = c("group1", "group2", "group3", "group1", 
"group2", "group1", "group1", "group1", "group4", "group1", "group4", 
"group4", "group1", "group4", "group1", "group1", "group2", "group1", 
"group4", "group2", "group4", "group2", "group3", "group3", "group1", 
"group1", "group3", "group3", "group1", "group1", "group3", "group1", 
"group4", "group3", "group3", "group1", "group2", "group1", "group4", 
"group1", "group3", "group3", "group3", "group2", "group2", "group4", 
"group3", "group3", "group3", "group2", "group3", "group2", "group1", 
"group1", "group3", "group1", "group1", "group2", "group4", "group1", 
"group4", "group1", "group1", "group4", "group1", "group3", "group4", 
"group1", "group4", "group2", "group4", "group1", "group2", "group4", 
"group1", "group4", "group1", "group2", "group1", "group1", "group1", 
"group1", "group2", "group1", "group3", "group1", "group1", "group1", 
"group3", "group4", "group1", "group3", "group1", "group3", "group4", 
"group1", "group2", "group1", "group3", "group1"), category = c("category4", 
"category5", "category2", "category4", "category3", "category6", 
"category3", "category1", "category4", "category2", "category6", 
"category6", "category5", "category5", "category4", "category4", 
"category1", "category6", "category1", "category4", "category6", 
"category6", "category2", "category6", "category3", "category2", 
"category6", "category3", "category6", "category1", "category6", 
"category2", "category2", "category2", "category5", "category1", 
"category1", "category4", "category3", "category4", "category4", 
"category5", "category1", "category3", "category5", "category2", 
"category2", "category5", "category5", "category2", "category6", 
"category6", "category5", "category1", "category4", "category3", 
"category6", "category1", "category6", "category3", "category2", 
"category2", "category3", "category2", "category2", "category5", 
"category4", "category4", "category4", "category4", "category1", 
"category5", "category6", "category5", "category4", "category5", 
"category1", "category2", "category3", "category5", "category3", 
"category2", "category4", "category6", "category4", "category6", 
"category1", "category4", "category4", "category3", "category4", 
"category5", "category5", "category6", "category4", "category3", 
"category5", "category3", "category3", "category1"), count = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

When I run the following:当我运行以下命令时:

pivot_sample %>% 
  ggplot(aes(x=group,fill=category))+
  geom_bar()

在此处输入图像描述 The stat_count() default function seems to work just fine with the default position="stack" However, when I switch to position="dodge" in the code below: stat_count()默认 function 似乎与默认position="stack"一起工作得很好但是,当我在下面的代码中切换到position="dodge"时:

pivot_sample %>% 
  ggplot(aes(x=group,y=count,fill=category))+
  geom_bar(position = "dodge",stat = "identity")

在此处输入图像描述 It won't count the count variable.它不会计算count变量。

I am sure there is something basic I am missing and could use another perspective.我确信我缺少一些基本的东西,可以使用另一种观点。 Do I need to use a count function for the y= argument in the aes() ?我需要对aes()中的y=参数使用count function 吗?

All help would be appreciated!所有帮助将不胜感激!

OP, the simple answer here is just to add position="dodge" to your original plot code and it works fine to separate the bars according to the group aesthetic (which is not specified, so it will default for the bar geom to use the fill aesthetic as the one to group by): OP,这里的简单答案只是将position="dodge"添加到您的原始 plot 代码中,并且可以根据组美学(未指定,因此默认为 bar geom 使用fill审美作为分组依据):

pivot_sample %>%
  ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge')

在此处输入图像描述

The reason is that the default option for the stat argument in geom_bar is stat="count" .原因是geom_barstat参数的默认选项是stat="count" This will count all the observations and plot along the y axis the "count".这将沿 y 轴计算所有观察值和 plot 的“计数”。 To access this you can use the .. notation: ..count.. , but it's not necessary with geom_bar() .要访问它,您可以使用..表示法: ..count.. ,但geom_bar()没有必要。 So, the code below shows you kind of a long form that shows you the same plot:因此,下面的代码向您展示了一种长格式,它显示了相同的 plot:

pivot_sample %>%
ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge', aes(y=..count..), stat="count")

Note that your data frame has a column called "count", but pivot_sample$count is not what is accessed when you specify and use ..count.. .请注意,您的数据框有一个名为“count”的列,但是当您指定和使用..count..时, pivot_sample$count不是访问的内容。 What's being accessed there is the result after the stat="count" function is run.stat="count" function 运行后访问的结果。

What happened when you used stat="identity" ?当您使用stat="identity"时发生了什么? Well, the "identity" stat plots the actual value on the y axis.好吧, "identity"统计数据在 y 轴上绘制了实际值。 You specified y=count , which means that the value of the column pivot_sample$count was plotted at each grouping and category.您指定了y=count ,这意味着在每个分组和类别中绘制了pivot_sample$count列的值。 geom_bar with stat="identity" is the same as using geom_col() (which should be used in that case), which will require x and y aesthetics to be defined. stat="identity"geom_bar与使用geom_col()相同(在这种情况下应该使用),这将需要定义xy美学。 In this case, the "identity" will result in adding up all the values of the y aesthetic - or pivot_sample$count .在这种情况下,“身份”将导致 y 审美的所有值相加 - 或pivot_sample$count

In your plot you showed using stat="identity" , you are seeing the value of count represented as the bar height equal to the sum of all values of pivot_sample$count for each bar.在您使用stat="identity"展示的 plot 中,您看到count表示为条形高度,等于每个条形的所有pivot_sample$count值的总和。 You don't have a lot of values = 1 for that column in the data, so that's why it looks the way it does.对于数据中的该列,您没有很多值 = 1,这就是它看起来如此的原因。

Note that geom_bar() using stat="count" counts observations , whereas stat="identity" totals the value .请注意,使用stat="count"geom_bar()计算观察值,而stat="identity"计算的总和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM