R - ggplot2 '躲避' geom_step() 重疊 geom_bar()

Question

使用 ggplot2 的geom_bar(stat="identity")繪制計數是可視化計數的有效方法。 我想使用這種方法來顯示我觀察到的計數並將它們與預期計數進行比較我想通過使用geom_step在geom_step上覆蓋階梯圖圖層來做到這一點。

但是，當我這樣做時，我遇到了默認情況下條形圖的位置被躲避但geom_step沒有的問題。 例如同時使用連續和離散因變量：

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red')

test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red'))

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

正如您所看到的，這兩層是偏移的，這是不可取的。

閱讀文檔我看到geom_path有一個position =選項但是嘗試像geom_step(color = 'red', position = position_dodge(width = 0.5))並沒有做我想要的，而是將條和階梯線壓縮到中心。 另一種選擇是像這樣直接調整數據geom_step(aes(a-0.5, b), color = 'red')這對於具有連續因變量的數據產生接近可接受的結果。 您還可以將階梯線計算為函數並使用stat_function()繪制它。

但是，這些方法不適用於具有離散因變量的數據，並且我的實際數據具有離散因變量，因此我需要另一個答案。

此外，當移動時，階梯線不會覆蓋上圖所示的最后一個欄。 有沒有一種簡單優雅的方法來擴展它以覆蓋最后一個酒吧？

如果geom_step()是錯誤的方法，而我想要得到的東西可以通過另一種方式實現，我也對此感興趣。

Answer 1

我認為解決這個問題最有效的方法是通過以下方式定義自定義幾何體：

library(tidyverse)

geom_step_extend <- function(data, extend = 1, nudge = -0.5,
                             ...) {
  # Function for computing the last segment data
  get_step_extend_data <- function(data, extend = 1, nudge = -0.5) {
    data_out <- as.data.frame(data[order(data[[1]]), ])
    n <- nrow(data)
    max_x_y <- data_out[n, 2]
    if (is.numeric(data_out[[1]])) {
      max_x <- data_out[n, 1] + nudge
    } else {
      max_x <- n + nudge
    }

    data.frame(x = max_x,
               y = max_x_y,
               xend = max_x + extend,
               yend = max_x_y)
  }

  # The resulting geom
  list(
    geom_step(position = position_nudge(x = nudge), ...),
    geom_segment(
      data = get_step_extend_data(data, extend = extend, nudge = nudge),
      mapping = aes(x = x, y = y,
                    xend = xend, yend = yend),
      ...
    )
  )
}

set.seed(111)
test <- data_frame(a = 1:10, b = runif(10, 1, 10))
test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test, colour = "red")

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test2, colour = "red")

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

基本上這個解決方案由三部分組成：

使用position_nudge按所需值（在本例中為 -0.5）步進曲線向左微調；
使用函數get_step_extend_data計算缺失的（右側的）段數據。 它的行為受到ggplot2:::stairstep啟發，這是geom_step的底層函數；
將geom_step與geom_segment在單獨的 geom 中與list 。

Answer 2

這是一個相當粗糙的解決方案，但在這種情況下應該有效。

創建一個備用數據框，將每條線展開以將 x 軸擴展 -0.5 和 0.5：

test2 <- data.frame(a = lapply(1:nrow(test), function(x) c(test[x,"a"]-.5, test[x,"a"], test[x, "a"]+0.5)) %>% unlist, 
                b = lapply(1:nrow(test), function(x) rep(test[x,"b"], 3)) %>% unlist)

使用 geom_line 參數繪制輪廓：

ggplot(test, aes(a,b)) + geom_bar(stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

如果將 geom_bar 寬度設置為 1，這將看起來更整潔：

ggplot(test, aes(a,b)) + geom_bar(width=1, stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

Answer 3

從 ggplot2 版本 3.3.0 開始， geom_step使用direction = "mid"支持此選項：

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red', direction = "mid", size = 2)

test_plot

Answer 4

我喜歡 molx 在 ggplot2 3.3.0 版中對geom_step()使用direction = 'mid'的回答。 但是，對於時間序列，我建議移動用於geom_bar()或geom_col()圖的 x 軸的數據：

data.frame(time = seq(as.POSIXct('2020-10-01 05:00'), 
                      as.POSIXct('2020-10-01 14:00'), by = 'hour'), 
                 value = runif(10, 0, 100)) %>%
  mutate(time_shift_bars = times + 30*60) %>% 
  ggplot(df, mapping = aes(y = value)) + 
  geom_step(color  = 'red', mapping = aes(x = time)) +
  geom_col(width = 60*60, mapping = aes(x = time_shift_bars))

![resulting plot](https://i.stack.imgur.com/fJBac.png)

The reason I prefer this is because for example 09:00 occurs at a specific instance, and the data represents the average for the following hour. If your time-series data is not averaged like this, use the `direction` method.

R - ggplot2 '躲避' geom_step() 重疊 geom_bar()

問題描述

4 個解決方案

解決方案1
4 已采納 2017-04-16 08:54:20

解決方案2
2 2017-04-16 08:53:27

解決方案3
2 2020-04-22 05:12:45

解決方案4
0 2021-01-04 06:43:51

R - ggplot2 &#39;躲避&#39; geom_step() 重疊 geom_bar()

問題描述

4 個解決方案

解決方案1 4 已采納 2017-04-16 08:54:20

解決方案2 2 2017-04-16 08:53:27

解決方案3 2 2020-04-22 05:12:45

解決方案4 0 2021-01-04 06:43:51

R - ggplot2 '躲避' geom_step() 重疊 geom_bar()

解決方案1
4 已采納 2017-04-16 08:54:20

解決方案2
2 2017-04-16 08:53:27

解決方案3
2 2020-04-22 05:12:45

解決方案4
0 2021-01-04 06:43:51