ggplot：多面條形圖中的weighted.mean和stat_summary

Question

我花了太多時間試圖找出解決方案，以將weighted.mean（或wtd.mean）包含到stat_summary中並使之正常工作。 我瀏覽過幾頁試圖解決相同的問題，但沒有一個有明確的解決方案。 主要問題在於，weight.mean一旦放在stat_summary中，就無法找到其權重成分，顯然不能從ggplot和/或stat_summary美學中傳遞下來（相信我，我嘗試過；請參見示例）。現在，我嘗試了各種方法，甚至使用基於ddplyr的函數生成了加權均值的barplot（如另一頁中所建議），但是除了有點笨拙之外，它還不允許進行faceting，因為它會更改源數據幀。

以下是為此問題專門構建的數據框。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

（注意：即使我不使用它，我也添加了“形狀”列，只是提醒我，我不想在此過程中丟失任何數據，但最后需要使用它。）

原始設置是一個ggplot，僅帶有“ mean”，其中包括刻面，如下所示：

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

以下是相應的代碼，但帶有“ weighted.mean”->忽略了“ w”美學，因此它假定所有權重都相等（通過weighted.mean函數定義），這導致簡單的均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

如您所見，它發出警告警告：忽略未知的美學：w

我嘗試了幾種方法使其“看到”重量變量，但沒有成功。 最后，我意識到最有前途的方法是重新定義weight.mean函數，以使其默認的“ w”成為“ x”的函數。 Weighted.mean仍然看不到任何“ w”的算法，但它將默認計算。 為此，我嘗試將本機函數（weighted.mean）嵌套到一個泛型函數中，這使我可以更改參數。

一步步。

首先，我嘗試使用“均值”（並且有效）。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它仍然不讀取“ w”（因為未指定“ w”），因此它會返回正常均值。

然后我嘗試將“ w”參數指定為數據框中的權重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但這不起作用。 警告消息： stat_summary()計算失敗：'x'和'w'必須具有相同的長度

被卡住了，我試圖生成一個隨機數序列，但長度與“ x”相同，並且令人驚訝地有效。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

顯然，有一種方法可以欺騙它，但是如果我只能使用隨機權重，那就沒有用了。

我試圖在函數中打印“ x”，然后應用它，盡管它產生了一些東西，但即使是“均值”也無法正常工作。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

因此，我不知道的棘手部分是如何正確地將“ w”默認值與“ x”相關聯，以便在stat_summary中調用weighted.mean而不讀取“ w”時，無論如何都要使用正確的權重。

正如我所提到的，還有一個ddply解決方法來獲取加權均值圖-因為它基於創建僅具有已組織的變量和加權均值的新源數據幀， 但是它不允許分面！！！

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

謝謝！

Answer 1

ggplot的內置摘要功能並不總是很有幫助，在很多情況下，您最好在一個單獨的步驟中計算摘要，然后進行繪圖。

您的基本示例圖實際上是不正確的。 它顯示“以太”具有分別為5和7的Δ和ε平均值，這在原始數據中顯然不是這種情況（這兩個值均為1）。 但是這些是數據框中第一個元素（“水”）的值。 出現錯誤是因為ggplot按字母順序構建其構面，與此同時，您傳入的是原始向量（ platonic$value ，而不是簡單的value ），這會使事物繪制在錯誤的位置。 使用ggplot時，應始終傳遞未加引號的原始列名，以便ggplot可以弄清楚如何處理關聯的數據。

基本情節的正確版本是：

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

至於使用weighted.mean ，正如我在上面說的，這里唯一合理的做法是分別計算出來，並繪制結果：

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

由於結果數據框仍然具有在第一個圖中使用的所有列名，因此您可以交換新數據集：

g.weighted <- g %+% platonic.weighted

在此示例中，兩個地塊是相同的，但是您的里程可能會有所不同。

關於您的預期最終結果是什么，您的問題尚不清楚，但是從給出的示例中，我假設您希望每個希臘字母的加權均值。 我們可以使用summarize輕松地做到這一點，或者，如果您確實想要，可以使用mutate插入一列權重而不會丟失原始數據：

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))

ggplot：多面條形圖中的weighted.mean和stat_summary

問題描述

1 個解決方案

解決方案1
2 已采納 2018-04-29 16:18:20

ggplot：多面條形圖中的weighted.mean和stat_summary

問題描述

1 個解決方案

解決方案1 2 已采納 2018-04-29 16:18:20

解決方案1
2 已采納 2018-04-29 16:18:20