ggplot2 boxplot medians沒有按預期繪圖

Question

所以，我有一個相當大的數據集（Dropbox：csv文件），我試圖用geom_boxplot 。 以下產生了似乎合理的情節：

require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)

df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))

d <- ggplot(data = df, aes(x = year, y = value)) +
    geom_boxplot(aes(fill = station)) + 
    facet_grid(station~.) +
    scale_y_continuous(limits = c(0, 15)) + 
    theme(legend.position = "none"))
d

然而，當你深入挖掘時，問題就會蔓延開來。 當我用它們的值標記boxplot medians時，會產生以下圖表。

df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value)) 
d

箱圖與 - 中位數標記

由geom_boxplot繪制的中位數根本不在中位數。 標簽以正確的y軸值繪制，但箱圖的中間鉸鏈絕對不在中位數。 我已經被這幾天困擾了。

這是什么原因？ 如何用正確的中位數生成這種類型的顯示？ 如何調試或診斷該圖？

Answer 1

這個問題的解決方案是scale_y_continuous的應用。 ggplot2將按以下順序執行操作：

比例變換
統計計算
坐標轉換

在這種情況下，因為調用了比例變換，所以ggplot2排除了用於統計計算boxplot鉸鏈的比例限制之外的數據。 然而，由aggregate函數計算並在geom_text指令中使用的中位數將使用整個數據集。 這可能導致不同的中間鉸鏈和文本標簽。

解決方案是省略scale_y_continuous指令，而是使用：

d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) + 
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))

這允許ggplot2使用整個數據集計算boxplot鉸鏈統計數據，同時限制圖的繪圖大小。

ggplot2 boxplot medians沒有按預期繪圖

問題描述

1 個解決方案

解決方案1
7 已采納 2015-04-09 13:40:45

ggplot2 boxplot medians沒有按預期繪圖

問題描述

1 個解決方案

解決方案1 7 已采納 2015-04-09 13:40:45

解決方案1
7 已采納 2015-04-09 13:40:45