簡體   English   中英

ggplot:多面條形圖中的weighted.mean和stat_summary

[英]ggplot: weighted.mean and stat_summary in a facetted bar plot

我花了太多時間試圖找出解決方案,以將weighted.mean(或wtd.mean)包含到stat_summary中並使之正常工作。 我瀏覽過幾頁試圖解決相同的問題,但沒有一個有明確的解決方案。 主要問題在於,weight.mean一旦放在stat_summary中,就無法找到其權重成分,顯然不能從ggplot和/或stat_summary美學中傳遞下來(相信我,我嘗試過;請參見示例)。 現在,我嘗試了各種方法,甚至使用基於ddplyr的函數生成了加權均值的barplot(如另一頁中所建議),但是除了有點笨拙之外,它還不允許進行faceting,因為它會更改源數據幀。

以下是為此問題專門構建的數據框。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(注意:即使我不使用它,我也添加了“形狀”列,只是提醒我,我不想在此過程中丟失任何數據,但最后需要使用它。)

原始設置是一個ggplot,僅帶有“ mean”,其中包括刻面,如下所示:

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

以下是相應的代碼,但帶有“ weighted.mean”->忽略了“ w”美學,因此它假定所有權重都相等(通過weighted.mean函數定義),這導致簡單的均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

如您所見,它發出警告警告:忽略未知的美學:w

我嘗試了幾種方法使其“看到”重量變量,但沒有成功。 最后,我意識到最有前途的方法是重新定義weight.mean函數,以使其默認的“ w”成為“ x”的函數。 Weighted.mean仍然看不到任何“ w”的算法,但它將默認計算。 為此,我嘗試將本機函數(weighted.mean)嵌套到一個泛型函數中,這使我可以更改參數。

一步步。

首先,我嘗試使用“均值”(並且有效)。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它仍然不讀取“ w”(因為未指定“ w”),因此它會返回正常均值。

然后我嘗試將“ w”參數指定為數據框中的權重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但這不起作用。 警告消息: stat_summary()計算失敗:'x'和'w'必須具有相同的長度

被卡住了,我試圖生成一個隨機數序列,但長度與“ x”相同,並且令人驚訝地有效。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

顯然,有一種方法可以欺騙它,但是如果我只能使用隨機權重,那就沒有用了。

我試圖在函數中打印“ x”,然后應用它,盡管它產生了一些東西,但即使是“均值”也無法正常工作。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

因此,我不知道的棘手部分是如何正確地將“ w”默認值與“ x”相關聯,以便在stat_summary中調用weighted.mean而不讀取“ w”時,無論如何都要使用正確的權重。

正如我所提到的,還有一個ddply解決方法來獲取加權均值圖-因為它基於創建僅具有已組織的變量和加權均值的新源數據幀, 但是它不允許分面!!!

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

謝謝!

ggplot的內置摘要功能並不總是很有幫助,在很多情況下,您最好在一個單獨的步驟中計算摘要,然后進行繪圖。

您的基本示例圖實際上是不正確的。 它顯示“以太”具有分別為5和7的Δ和ε平均值,這在原始數據中顯然不是這種情況(這兩個值均為1)。 但是這些數據框中第一個元素(“水”)的值。 出現錯誤是因為ggplot按字母順序構建其構面,與此同時,您傳入的是原始向量( platonic$value ,而不是簡單的value ),這會使事物繪制在錯誤的位置。 使用ggplot時,應始終傳遞未加引號的原始列名,以便ggplot可以弄清楚如何處理關聯的數據。

基本情節的正確版本是:

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

在此處輸入圖片說明

至於使用weighted.mean ,正如我在上面說的,這里唯一合理的做法是分別計算出來,並繪制結果:

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

由於結果數據框仍然具有在第一個圖中使用的所有列名,因此您可以交換新數據集:

g.weighted <- g %+% platonic.weighted

在此示例中,兩個地塊是相同的,但是您的里程可能會有所不同。

關於您的預期最終結果是什么,您的問題尚不清楚,但是從給出的示例中,我假設您希望每個希臘字母的加權均值。 我們可以使用summarize輕松地做到這一點,或者,如果您確實想要,可以使用mutate插入一列權重而不會丟失原始數據:

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM