简体   繁体   中英

Y axis values are greater than actual values in geom_bar

I am trying to create a bar plot from the following dataset.

library(tidyverse)
library(janitor) 
library(lubridate)

product <- read_csv(
  "https://s3.us-west-2.amazonaws.com/public.gamelab.fun/dataset/Al-Bundy_raw-data.csv"
)

product <- product %>% 
  janitor::clean_names() %>% # this function cleans the names of the variables
  dplyr::rename_all(toupper)


When I run the following code, I got the bar plot -

product %>% 
  count(SIZE_US, GENDER) %>% 
  pivot_wider(
    names_from = "GENDER",
    values_from = "n"
  ) %>% 
  rename_all(toupper) %>%
  replace(is.na(.),0) %>% 
  mutate(
    TOTAL_SALES = FEMALE + MALE
  ) %>% 
  pivot_longer(
    cols = c("FEMALE", "MALE"),
    names_to = "GENDER",
    values_to = "GENDERSALES"
  )%>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= TOTAL_SALES, fill = GENDER))+
  geom_bar(stat = "identity")+
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE")+
  geom_text(
    aes(label = GENDERSALES), 
    position = position_stack(vjust = 0.5), 
    color = "white", 
    size = 2
  )

but the problem is Y axis has the values greater than actual values in the data. For example, in bar plot, it shows Y axis value is greater than 4000, but in data the actual highest value for y axis is 2346. I add the following as a list line of the last code -

scale_y_continuous(limits=c(0,2500),oob = rescale_none)

but half of the bars in the bar plot are out of the graph.

Stacked bar charts are used to show how the category is divided and what the relationship of each part has on the total value. The total value of the bar is sum of the categories.

In comming to your case you have two categories (Male and Female) and its maximum values 2346. As per the graph definitions it should have show all categories into the single bar, that's why you are getting greater than 4000 in Y axis.

You could solve this issue in two ways. one is remove Y-axis text and just show the relationship

product %>% 
  count(SIZE_US, GENDER) %>% 
  pivot_wider(
    names_from = "GENDER",
    values_from = "n"
  ) %>% 
  rename_all(toupper) %>%
  replace(is.na(.),0) %>% 
  mutate(
    TOTAL_SALES = FEMALE + MALE
  ) %>% 
  pivot_longer(
    cols = c("FEMALE", "MALE"),
    names_to = "GENDER",
    values_to = "GENDERSALES"
  ) -> plot_data 

plot_data %>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= as.numeric(TOTAL_SALES), fill = GENDER))+
  geom_bar(stat = "identity") +
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE") +
  geom_text(
    aes(label = GENDERSALES), 
    position = position_stack(vjust = 0.5), 
    color = "white", 
    size = 2
  ) +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

second is use grouped bar chart instead of stacked bar chart

plot_data %>% 
  ggplot(aes(x=reorder(SIZE_US,as.numeric(SIZE_US)),y= as.numeric(TOTAL_SALES), fill = GENDER))+
  geom_col(position = "dodge2") +
  labs(x = "SHOE SIZE",
       y = "TOTAL SALES",
       title = "SALES OF DIFFERENT SIZES OF SHOE") +
  geom_text(
    aes(label = GENDERSALES), 
    position = position_dodge2(width = .9), 
    color = "white", 
    size = 2,
    vjust = -0.5 
  )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM