简体   繁体   English

ggplot:多面条形图中的weighted.mean和stat_summary

[英]ggplot: weighted.mean and stat_summary in a facetted bar plot

I've spent too much time trying to figure out a solution for including weighted.mean (or wtd.mean) into stat_summary and make it work properly. 我花了太多时间试图找出解决方案,以将weighted.mean(或wtd.mean)包含到stat_summary中并使之正常工作。 I've looked to several pages trying to tackle the same issue but none had a definitive solution. 我浏览过几页试图解决相同的问题,但没有一个有明确的解决方案。 The main problem is that weighted.mean, once place in stat_summary, fails to find its weights component, which apparently can not be passed down from the ggplot and/or stat_summary aesthetics (believe me, I tried; see examples). 主要问题在于,weight.mean一旦放在stat_summary中,就无法找到其权重成分,显然不能从ggplot和/或stat_summary美学中传递下来(相信我,我尝试过;请参见示例)。 Now, I tried various approaches and I've even produced a barplot of weighted means using a ddplyr based function (as suggested in another page) but, beside being a bit cluncky, it does not allow facetting, as it changes the source dataframe. 现在,我尝试了各种方法,甚至使用基于ddplyr的函数生成了加权均值的barplot(如另一页中所建议),但是除了有点笨拙之外,它还不允许进行faceting,因为它会更改源数据帧。

The following is dataframe built on purpose for this problem. 以下是为此问题专门构建的数据框。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(A note: I've also added the "shape" column even if I will not use it, just to remind me that I don't want to lose any data in the process but it needs to be available at the end.) (注意:即使我不使用它,我也添加了“形状”列,只是提醒我,我不想在此过程中丢失任何数据,但最后需要使用它。)

The original setting was a ggplot just with "mean" which includes facetting, as in: 原始设置是一个ggplot,仅带有“ mean”,其中包括刻面,如下所示:

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

The following is the corresponding code but with "weighted.mean" --> the "w" aestethics is ignored, therefore it assumes all the weights to be equal (by the weighted.mean function definition), which results in a simple mean 以下是相应的代码,但带有“ weighted.mean”->忽略了“ w”美学,因此它假定所有权重都相等(通过weighted.mean函数定义),这导致简单的均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

As you can see, it gives a warning Warning: Ignoring unknown aesthetics: w 如您所见,它发出警告警告:忽略未知的美学:w

I tried several ways to make it "see" the weight variable but with no success. 我尝试了几种方法使其“看到”重量变量,但没有成功。 In the end I realised that the most promising way would be to redefine the weight.mean function so that its default "w" would be a function of "x". 最后,我意识到最有前途的方法是重新定义weight.mean函数,以使其默认的“ w”成为“ x”的函数。 Weighted.mean would still not see any "w" aeshetics but it would compute one as default. Weighted.mean仍然看不到任何“ w”的算法,但它将默认计算。 To achieve this I tried to nest the native function (weighted.mean) into a generic function, which allows me to change the arguments. 为此,我尝试将本机函数(weighted.mean)嵌套到一个泛型函数中,这使我可以更改参数。

Step by step. 一步步。

First I tried with "mean" (and it works). 首先,我尝试使用“均值”(并且有效)。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

Then with weighted.mean 然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

but it still doesn't read the "w" (as there's no "w" specified) so it gives back a normal mean. 但它仍然不读取“ w”(因为未指定“ w”),因此它会返回正常均值。

Then I tried to specify the "w" argument as the weights column in the dataframe 然后我尝试将“ w”参数指定为数据框中的权重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

but it doesn't work. 但这不起作用。 A warnign message says: Computation failed in stat_summary() : 'x' and 'w' must have the same length 警告消息: stat_summary()计算失败:'x'和'w'必须具有相同的长度

Being stuck, I tried to generate a random series of numbers but of the same length as "x" and it surprisingly worked. 被卡住了,我试图生成一个随机数序列,但长度与“ x”相同,并且令人惊讶地有效。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

Obviously, there's a way to trick it but it's no use if I can use only random weights. 显然,有一种方法可以欺骗它,但是如果我只能使用随机权重,那就没有用了。

I tried to print "x" within the function and then applied it and, while it produces something, even "mean" doesn't work properly anymore. 我试图在函数中打印“ x”,然后应用它,尽管它产生了一些东西,但即使是“均值”也无法正常工作。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

So, the tricky part that I can not figure out is how to relate properly the "w" default to the "x" so that when the weighted.mean is called within stat_summary, not reading a "w", uses anyway the correct weights. 因此,我不知道的棘手部分是如何正确地将“ w”默认值与“ x”相关联,以便在stat_summary中调用weighted.mean而不读取“ w”时,无论如何都要使用正确的权重。

As I mentioned, there is also a ddply workaround to obtain a weighted mean plot - as it is based on creating a new source dataframe with just the variables already organised and the weighted means, but it does not allow facetting!!! 正如我所提到的,还有一个ddply解决方法来获取加权均值图-因为它基于创建仅具有已组织的变量和加权均值的新源数据帧, 但是它不允许分面!!!

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

Thanks! 谢谢!

ggplot's built-in summary functions aren't always helpful, and much of the time you're better off computing your summary in a separate step and then plotting that. ggplot的内置摘要功能并不总是很有帮助,在很多情况下,您最好在一个单独的步骤中计算摘要,然后进行绘图。

Your basic example plot is actually incorrect. 您的基本示例图实际上是不正确的。 It shows "aether" as having means for delta and epsilon of 5 and 7, respectively, which is clearly not the case in the raw data (both these values are 1). 它显示“以太”具有分别为5和7的Δ和ε平均值,这在原始数据中显然不是这种情况(这两个值均为1)。 But those are the values for the first element in the data frame ("water"). 但是这些数据框中第一个元素(“水”)的值。 The error arises because ggplot builds its facets in alphabetical order, while at the same time, you are passing in the raw vectors ( platonic$value , rather than simply value ), which causes things to be plotted in the wrong position. 出现错误是因为ggplot按字母顺序构建其构面,与此同时,您传入的是原始向量( platonic$value ,而不是简单的value ),这会使事物绘制在错误的位置。 You should always pass the raw, unquoted column name when working with ggplot, so that ggplot can figure out how to work with the associated data. 使用ggplot时,应始终传递未加引号的原始列名,以便ggplot可以弄清楚如何处理关联的数据。

The correct version of your basic plot would be: 基本情节的正确版本是:

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

在此处输入图片说明

As for using weighted.mean , as I said above, the only reasonable course of action here is to compute that separately, and plot the result: 至于使用weighted.mean ,正如我在上面说的,这里唯一合理的做法是分别计算出来,并绘制结果:

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

Since the resulting data frame still has all the column names used in the first plot, you can just swap in the new data set: 由于结果数据框仍然具有在第一个图中使用的所有列名,因此您可以交换新数据集:

g.weighted <- g %+% platonic.weighted

With this example, the two plots are identical, but your mileage may vary. 在此示例中,两个地块是相同的,但是您的里程可能会有所不同。

Your question is a little unclear as to what your expected end result is, but from the example given, I assume you want a weighted mean for each greek letter. 关于您的预期最终结果是什么,您的问题尚不清楚,但是从给出的示例中,我假设您希望每个希腊字母的加权均值。 We can use summarize to do that easily, or if you really wanted, you could use mutate instead to insert a column of weights without losing the original data: 我们可以使用summarize轻松地做到这一点,或者,如果您确实想要,可以使用mutate插入一列权重而不会丢失原始数据:

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM