简体   繁体   English

Y 轴值大于 geom_bar 中的实际值

[英]Y axis values are greater than actual values in geom_bar

I am trying to create a bar plot from the following dataset.我正在尝试从以下数据集中创建一个条形 plot。

library(tidyverse)
library(janitor) 
library(lubridate)

product <- read_csv(
  "https://s3.us-west-2.amazonaws.com/public.gamelab.fun/dataset/Al-Bundy_raw-data.csv"
)

product <- product %>% 
  janitor::clean_names() %>% # this function cleans the names of the variables
  dplyr::rename_all(toupper)


When I run the following code, I got the bar plot -当我运行以下代码时,我得到了酒吧 plot -

product %>% 
  count(SIZE_US, GENDER) %>% 
  pivot_wider(
    names_from = "GENDER",
    values_from = "n"
  ) %>% 
  rename_all(toupper) %>%
  replace(is.na(.),0) %>% 
  mutate(
    TOTAL_SALES = FEMALE + MALE
  ) %>% 
  pivot_longer(
    cols = c("FEMALE", "MALE"),
    names_to = "GENDER",
    values_to = "GENDERSALES"
  )%>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= TOTAL_SALES, fill = GENDER))+
  geom_bar(stat = "identity")+
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE")+
  geom_text(
    aes(label = GENDERSALES), 
    position = position_stack(vjust = 0.5), 
    color = "white", 
    size = 2
  )

but the problem is Y axis has the values greater than actual values in the data.但问题是 Y 轴的值大于数据中的实际值。 For example, in bar plot, it shows Y axis value is greater than 4000, but in data the actual highest value for y axis is 2346. I add the following as a list line of the last code -例如,在条形 plot 中,它显示 Y 轴值大于 4000,但在数据中 y 轴的实际最高值是 2346。我将以下内容添加为最后一个代码的列表行 -

scale_y_continuous(limits=c(0,2500),oob = rescale_none)

but half of the bars in the bar plot are out of the graph.但是条形图 plot 中的一半条形不在图中。

Stacked bar charts are used to show how the category is divided and what the relationship of each part has on the total value.堆积条形图用于显示类别如何划分以及各部分对总值的关系。 The total value of the bar is sum of the categories.条形的总值是类别的总和。

In comming to your case you have two categories (Male and Female) and its maximum values 2346. As per the graph definitions it should have show all categories into the single bar, that's why you are getting greater than 4000 in Y axis.在您的情况下,您有两个类别(男性和女性),其最大值为 2346。根据图形定义,它应该将所有类别显示在单个条中,这就是为什么 Y 轴上的值大于 4000。

You could solve this issue in two ways.您可以通过两种方式解决此问题。 one is remove Y-axis text and just show the relationship一种是删除 Y 轴文本并仅显示关系

product %>% 
  count(SIZE_US, GENDER) %>% 
  pivot_wider(
    names_from = "GENDER",
    values_from = "n"
  ) %>% 
  rename_all(toupper) %>%
  replace(is.na(.),0) %>% 
  mutate(
    TOTAL_SALES = FEMALE + MALE
  ) %>% 
  pivot_longer(
    cols = c("FEMALE", "MALE"),
    names_to = "GENDER",
    values_to = "GENDERSALES"
  ) -> plot_data 

plot_data %>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= as.numeric(TOTAL_SALES), fill = GENDER))+
  geom_bar(stat = "identity") +
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE") +
  geom_text(
    aes(label = GENDERSALES), 
    position = position_stack(vjust = 0.5), 
    color = "white", 
    size = 2
  ) +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

second is use grouped bar chart instead of stacked bar chart其次是使用分组条形图而不是堆叠条形图

plot_data %>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= as.numeric(TOTAL_SALES), fill = GENDER))+
  geom_col(position = "dodge2") +
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE") +
  geom_text(
    aes(label = GENDERSALES), 
    position = position_dodge2(width = .9), 
    color = "white", 
    size = 2,
    vjust = -0.5 
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM