简体   繁体   English

解释ggplot2中的“stat_summary = mean_cl_boot”?

[英]Interpretation of “stat_summary = mean_cl_boot” at ggplot2?

a perhaps simple question I tried to make an errorgraph like the one shown in page 532 of Field's "Discovering Statistics Using R". 一个或许简单的问题,我试图制作一个错误图,就像Field的“使用R发现统计数据”第532页所示。

The code can be found here http://www.sagepub.com/dsur/study/DSUR%20R%20Script%20Files/Chapter%2012%20DSUR%20GLM3.R : 代码可以在这里找到http://www.sagepub.com/dsur/study/DSUR%20R%20Script%20Files/Chapter%2012%20DSUR%20GLM3.R :

line <- ggplot(gogglesData, aes(alcohol, attractiveness, colour = gender))
line + stat_summary(fun.y = mean, geom = "point") + 
stat_summary(fun.y = mean, geom = "line", aes(group= gender)) + 
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + 
labs(x = "Alcohol Consumption", y = "Mean Attractiveness of Date (%)", colour = "Gender")  

I produced the same graph; 我制作了相同的图表; my y-axis variable has only 4-points (it is a discrete scale, 1-4), now the y-axis has the points 1.5, 2, 2.5 in which the lines vary. 我的y轴变量只有4个点(它是一个离散的刻度,1-4),现在y轴有点1.5,2,2.5,其中线条变化。

And the question is: what do these points and graphs describe? 问题是:这些点和图表描述了什么? I assume that the important part is stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) are they count of observations for that group and that level(x-axis)? 我假设重要的部分是stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2)它们是对该组和该级别(x轴)的观察计数? Are they frequencies? 它们是频率吗? Or, are they proportions? 或者,它们的比例是多少?

I found this http://docs.ggplot2.org/0.9.3/stat_summary.html but it did not help me 我找到了这个http://docs.ggplot2.org/0.9.3/stat_summary.html,但它没有帮助我

Thank you 谢谢

Here is what the ggplot2 book on page 83 says about mean_cl_boot() 以下是第83页的ggplot2 书中有关mean_cl_boot()

Function          Hmisc original        Middle Range
mean_cl_boot() smean.cl.boot() Mean Standard error from bootstrap

I think that it is the smean.cl.boot() from Hmisc package but renamed as mean.cl.boot() in ggplot2. 我认为它是来自Hmisc包的smean.cl.boot() ,但在ggplot2中重命名为mean.cl.boot()

and here is the definition of original function from Hmisc package : 这里是Hmisc包中原始函数的定义:

smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality smean.cl.boot是基本非参数自举的非常快速的实现,用于获得总体均值的置信限,而不假设正态性

I reproduced the graph using your code and I get essentially the same graph shown in Field's book, Discovering Statistics Using R, figure 12.12, page 532, except for the ordering of the variables on the x axis. 我使用你的代码重现了这个图,我得到的字段基本上是字段的书“使用R发现统计数据”,图12.12,第532页,除了x轴上变量的排序。 The y axis displays the continuous variable, Mean Attractiveness of Date (%). y轴显示连续变量,日期的平均吸引力(%)。 The 95% confidence intervals, created--as you point out--with the stat_summary() function and the mean_cl_boot argument are bootstrap confidence intervals using the smean.cl.boot() function in Hmisc, as pointed out by another commenter above. 使用stat_summary()函数和mean_cl_boot参数创建的95%置信区间是使用hmisc中的smean.cl.boot()函数创建的自举置信区间,正如上面另一位评论者所指出的那样。 This function is described on page 262 of the Hmisc documentation . Hmisc 文档的第262页描述了此功能。 The ggplot2 documentation on mean_cl_boot is sparse and defers to the description in the Hmisc package. 关于mean_cl_boot的ggplot2 文档是稀疏的,并且遵循 Hmisc包中的描述。

Note that the arguments to mean_cl_boot in ggplot2 are the same as those in the smean.cl.boot function in the Hmisc package. 请注意,ggplot2中mean_cl_boot的参数与Hmisc包中的smean.cl.boot函数中的参数相同。 You can change the desired confidence level from the default of .95 by using the conf.int argument and the number of bootstrap samples by using the B argument. 您可以使用conf.int参数和使用B参数的bootstrap样本数来更改默认值.95所需的置信度。 Here, for example, is the code for creating the same plot with a 99% confidence interval and 5000 bootstrap samples: 例如,这里是用于创建具有99%置信区间和5000个引导样本的相同图的代码:

line <- ggplot(gogglesData, aes(alcohol, attractiveness, colour = gender))
line + stat_summary(fun.y = mean, geom = "point") + 
stat_summary(fun.y = mean, geom = "line", aes(group= gender)) + 
stat_summary(fun.data = mean_cl_boot, conf.int = .99, B = 5000, geom = "errorbar", width = 0.2) + 
labs(x = "Alcohol Consumption", y = "Mean Attractiveness of Date (%)", colour = "Gender") 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM