簡體   English   中英

使用ggplot繪制線形圖和條形圖(帶有輔助軸的線圖)

[英]Plot line and bar graph (with secondary axis for line graph) using ggplot

問題

兩天前我才剛開始R。 我已經看過一些基本的R教程,並且能夠繪制二維數據。 我從Oracle數據庫中提取數據。 現在,當我嘗試使用輔助軸合並兩種圖形類型(線形和條形)時遇到問題。

我沒有問題,在Excel上繪制此數據。 以下是情節:

在此處輸入圖片說明

我無法在R上繪制它。我搜索了一些相關示例,但無法根據我的要求對其進行調整(在ggplot2中組合條形圖和折線圖(雙軸)

以下是我用來分別繪制條形圖和折線圖的代碼:

酒吧:

p <- ggplot(data = df, aes(x = MONTHS, y = BASE)) + 
    geom_bar(stat="identity") + 
    theme_minimal() +
    geom_text(aes(label = BASE), vjust = 1.6, color = "White", size = 2.5)

線:

p1 <- ggplot(data = df, aes(x = MONTHS, y = df$INTERNETPERCENTAGE, group = 1)) + 
    geom_line() + 
    geom_point()

數據

更新:我有以下數據(原始數據已清除“,”和“%”符號):

> dput(head(df,20))
structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L, 
1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("Apr-18", "Aug-18", 
"Dec-17", "Feb-18", "Jan-18", "Jul-18", "Jun-18", "Mar-18", "May-18", 
"Nov-17", "Oct-17", "Oct-18", "Sep-18"), class = "factor"), BASE = c(40756228L, 
41088219L, 41642601L, 42017111L, 42439446L, 42847468L, 43375319L, 
43440484L, 43464735L, 43326823L, 43190949L, 43015301L, 42780071L
), INTERNETUSERGREATERTHAN0KB = c(13380576L, 13224502L, 14044105L, 
14239169L, 14011423L, 14736043L, 14487827L, 14460410L, 14632695L, 
14896654L, 15019329L, 14141766L, 14209288L), INTERNETPERCENTAGE = c(33L, 
32L, 34L, 34L, 33L, 34L, 33L, 33L, 34L, 34L, 35L, 33L, 33L), 
    SMARTPHONE = c(11610216L, 11875033L, 12225965L, 12412010L, 
    12760251L, 12781082L, 13142400L, 13295826L, 13422476L, 13408216L, 
    13504339L, 13413596L, 13586438L), SMARTPHONEPERCENTAGE = c(28L, 
    29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 32L
    ), INTERNETUSAGEGREATERTHAN0KB4G = c(829095L, 969531L, 1181411L, 
    1339620L, 1474300L, 1733027L, 1871816L, 1967129L, 2117418L, 
    2288215L, 2453243L, 2624865L, 2817199L)), row.names = c(NA, 
13L), class = "data.frame")

ggplot是一個“高級”繪圖庫,這意味着它是為表達數據中的明確關系而構建的,而不是一個用於繪制形狀的簡單系統。 它的基本假設之一是,輔助或雙數據軸通常不是一個好主意。 這樣的圖形在同一空間中繪制了多個關系,而不能保證兩個軸實際上共享有意義的關系(例如,參見虛假相關 )。

綜上所述ggplot確實具有定義輔助軸的能力,盡管故意將其用於您描述的目的。 實現目標的一種方法是將數據集分成兩個單獨的數據集,然后將它們繪制在同一ggplot對象中。 當然可以,但是請注意要產生想要的效果需要多少額外的代碼。

library(tidyverse)
library(scales)

df.base <- df[c('MONTHS', 'BASE')] %>% 
  mutate(MONTHS = factor(MONTHS, MONTHS, ordered = T))

df.percent <- df[c('MONTHS', 'INTERNETPERCENTAGE', 'SMARTPHONEPERCENTAGE')] %>% 
  gather(variable, value, -MONTHS)

g <- ggplot(data = df.base, aes(x = MONTHS, y = BASE)) +
  geom_col(aes(fill = 'BASE')) +
  geom_line(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable, group = variable)) +
  geom_point(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable)) +
  geom_label(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, fill = variable, label = sprintf('%i%%', value)), color = 'white', vjust = 1.6, size = 4) +
  scale_y_continuous(sec.axis = sec_axis(~(. - 33500000) / 12500000 * 40, name = 'PERCENT'), labels = comma) +
  scale_fill_manual(values = c('lightblue', 'red', 'darkgreen')) +
  scale_color_manual(values = c('red', 'darkgreen')) +
  coord_cartesian(ylim = c(33500000, 45500000)) +
  labs(fill = NULL, color = NULL) +
  theme_minimal()
print(g)

在此處輸入圖片說明

請注意,我的答案是基於您原始的“未清除”數據(我將其附加在文章的底部)。

此處的關鍵是轉換百分比值,以使它們使用與BASE相同的范圍。 然后,我們應用變換的逆函數將原始百分比值顯示為第二個y軸。

一個(個人)警告語:次要軸通常不是一個好主意 就個人而言,我將使用構面或兩個單獨的圖,以避免圖的混亂和重載。 還要注意,Hadley本人並不喜歡雙Y軸 ,因此(明智地)限制了ggplot2支持。

除此之外,這是一個選擇:

  1. 首先,讓我們清理數據(刪除千位分隔符,百分號等)。

     library(tidyverse) df.clean <- df %>% mutate_if(is.factor, as.character) %>% gather(USAGE, PERCENTAGE, INTERNETPERCENTAGE, SMARTPHONEPERCENTAGE) %>% mutate( MONTHS = factor(MONTHS, levels = df$MONTHS), BASE = as.numeric(str_replace_all(BASE, ",", "")), PERCENTAGE = as.numeric(str_replace(PERCENTAGE, "%", ""))) 
  2. 現在,我們計算變換系數:

     y1 <- min(df.clean$BASE) y2 <- max(df.clean$BASE) x1 <- min(df.clean$PERCENTAGE) x2 <- max(df.clean$PERCENTAGE) b <- (y2 - y1) / (x2 - x1) a <- y1 - b * x1 
  3. 現在進行繪圖:

     df.clean %>% mutate(perc.scaled = a + b * PERCENTAGE) %>% ggplot(aes(MONTHS, BASE)) + geom_col( data = df.clean %>% distinct(MONTHS, .keep_all = TRUE), aes(MONTHS, BASE), fill = "dodgerblue4") + geom_point(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) + geom_line(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) + geom_label( aes(MONTHS, perc.scaled, label = PERCENTAGE, fill = USAGE), vjust = 1.4, show.legend = F) + scale_y_continuous( name = "BASE", sec.axis = sec_axis(~ (. - a) / b, name = "Percentage")) + coord_cartesian(ylim = c(0.99 * min(df.clean$BASE), max(df.clean$BASE))) + theme_minimal() + theme(legend.position = "bottom") 

在此處輸入圖片說明


樣本數據

df <- structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L,
1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("APR-18", "AUG-18",
"DEC-17", "FEB-18", "JAN-18", "JUL-18", "JUN-18", "MAR-18", "MAY-18",
"NOV-17", "OCT-17", "OCT-18", "SEP-18"), class = "factor"), BASE = structure(c(1L,
2L, 3L, 4L, 5L, 7L, 11L, 12L, 13L, 10L, 9L, 8L, 6L), .Label = c("40,756,228",
"41,088,219", "41,642,601", "42,017,111", "42,439,446", "42,780,071",
"42,847,468", "43,015,301", "43,190,949", "43,326,823", "43,375,319",
"43,440,484", "43,464,735"), class = "factor"), INTERNETUSERGREATERTHAN0KB = structure(c(2L,
1L, 4L, 7L, 3L, 11L, 9L, 8L, 10L, 12L, 13L, 5L, 6L), .Label = c("13,224,502",
"13,380,576", "14,011,423", "14,044,105", "14,141,766", "14,209,288",
"14,239,169", "14,460,410", "14,487,827", "14,632,695", "14,736,043",
"14,896,654", "15,019,329"), class = "factor"), INTERNETPERCENTAGE = structure(c(2L,
1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 4L, 2L, 2L), .Label = c("32%",
"33%", "34%", "35%"), class = "factor"), SMARTPHONE = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 11L, 9L, 12L, 10L, 13L), .Label = c("11,610,216",
"11,875,033", "12,225,965", "12,412,010", "12,760,251", "12,781,082",
"13,142,400", "13,295,826", "13,408,216", "13,413,596", "13,422,476",
"13,504,339", "13,586,438"), class = "factor"), SMARTPHONEPERCENTAGE = structure(c(1L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("28%",
"29%", "30%", "31%", "32%"), class = "factor"), INTERNETUSAGEGREATERTHAN0KB4G = structure(c(12L,
13L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("1,181,411 ",
"1,339,620 ", "1,474,300 ", "1,733,027 ", "1,871,816 ", "1,967,129 ",
"2,117,418 ", "2,288,215 ", "2,453,243 ", "2,624,865 ", "2,817,199 ",
"829,095 ", "969,531 "), class = "factor")), row.names = c(NA,
13L), class = "data.frame")

您需要具有一個類似於最大y軸1最大y軸2之比的變換因子 在此, 次要y軸應比主要y軸小100,000倍 因此:

ggplot(df) + 
    geom_col(aes(x = MONTHS, y = BASE)) +
    # apply transformation factor to line plot
    geom_line(aes(x = MONTHS, y = INTERNETPERCENTAGE/0.000001, group = 1), 
              color = "red", size = 1) +
    theme_minimal() +
    geom_text(aes(x = MONTHS, y = BASE, label=BASE), 
              vjust=1.6, color="White", size=2.5) +
    # add secondary y-axis that is 100,000 times smaller
    scale_y_continuous(sec.axis = sec_axis(~.*0.000001, name = "Internet Percentage in %")) +
    labs(y = "Base", x = "Months")

1個

數據

df <- structure(list(MONTHS = structure(c(17440, 17471, 17501, 17532, 17563, 17591, 17622, 17652, 17683, 17713, 17744, 17775, 17805), class = "Date"), BASE = c(40756228L, 41088219L, 41642601L, 42017111L, 42439446L, 42847468L, 43375319L, 43440484L, 43464735L, 43326823L, 43190949L, 43015301L, 42780071L), INTERNETUSERGREATERTHAN0KB = c(13380576L, 13224502L, 14044105L, 14239169L, 14011423L, 14736043L, 14487827L, 14460410L, 14632695L, 14896654L, 15019329L, 14141766L, 14209288L), INTERNETPERCENTAGE = c(33L, 32L, 34L, 34L, 33L, 34L, 33L, 33L, 34L, 34L, 35L, 33L, 33L), SMARTPHONE = c(11610216L, 11875033L, 12225965L, 12412010L, 12760251L, 12781082L, 13142400L, 13295826L, 13422476L, 13408216L, 13504339L, 13413596L, 13586438L), SMARTPHONEPERCENTAGE = c(28L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 32L), INTERNETUSAGEGREATERTHAN0KB4G = c(829095L, 969531L, 1181411L, 1339620L, 1474300L, 1733027L, 1871816L, 1967129L, 2117418L, 2288215L, 2453243L, 2624865L, 2817199L)), row.names = c(NA, 13L), class = "data.frame")

說明

次要y軸只是視覺上的 ggplot在第一個y軸上繪制我們的geom_line() (值大約為33,000,000)。 輔助y軸稍后添加。 您可以查看是否簽出

> ggplot_build(p)[[1]][[2]]
          y     x group PANEL colour size linetype alpha
1  33000000 17440     1     1    red    1        1    NA
2  32000000 17471     1     1    red    1        1    NA
3  34000000 17501     1     1    red    1        1    NA
4  34000000 17532     1     1    red    1        1    NA
5  33000000 17563     1     1    red    1        1    NA
6  34000000 17591     1     1    red    1        1    NA
7  33000000 17622     1     1    red    1        1    NA
8  33000000 17652     1     1    red    1        1    NA
9  34000000 17683     1     1    red    1        1    NA
10 34000000 17713     1     1    red    1        1    NA
11 35000000 17744     1     1    red    1        1    NA
12 33000000 17775     1     1    red    1        1    NA
13 33000000 17805     1     1    red    1        1    NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM