简体   繁体   English

在 mean() 函数中,trim 参数代表什么?

[英]what does the trim argument stands for in the mean() function?

I just can't understand the concept of trim.我只是无法理解修剪的概念。 I thought it was rounding the numbers at first but this doesn't make sense.一开始我以为它是四舍五入的数字,但这没有意义。 Can anyone clarify what trim is doing here?谁能澄清一下修剪在这里做什么?

# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)

# Calculate the mean of the sum
avg_sum <- mean(c(linkedin+facebook))

# Calculate the trimmed mean of the sum
avg_sum_trimmed <- mean(c(linkedin+facebook), trim = 0.2)

# Inspect both new variables
avg_sum
[1] 22.28571
avg_sum_trimmed
[1] 22.6

I'm placing two mean functions, one with and the other without the trim argument.我正在放置两个均值函数,一个带有修剪参数,另一个没有。 Any comments on how to clarify this concept is welcome.欢迎任何关于如何澄清这个概念的评论。

According to ?mean根据?mean

trim -The fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed.修剪 - 在计算平均值之前要从 x 的每一端修剪的观察值的分数(0 到 0.5)。 Values of trim outside that range are taken as the nearest endpoint.超出该范围的修剪值将作为最近的端点。

If we use the vector 'v1'如果我们使用向量“v1”

v1 <- c(linkedin + facebook)

with length 7, the sort ed values would be length 7, sort值将是

v2 <- sort(v1)

Removing 20% of the observations from either end (of the sort ed vector would be roughly 1st and last observation being removed从任一端删除 20% 的观察结果( sort vector将大致是第一个和最后一个被删除的观察结果

mean(v2[-c(1, 7)])
#[1] 22.6

which is equal to这等于

mean(v1, trim = 0.2)
#[1] 22.6

-checking with trim = 0.4 -用trim = 0.4

mean(v2[-c(1:2, 6:7)])
#[1] 22.33333
mean(v1, trim = 0.4)
#[1] 22.33333

The code you show looks like an example from Intermediate R from Datacamp.您显示的代码看起来像是来自 Datacamp 的中级 R 的示例。 Unfortunately, the class offers no further explanation of what a trimmed mean does nor when you should actually use it.不幸的是,该课程没有进一步解释修剪后的含义以及何时应该实际使用它。 I also found myself quite loss with why we should use it.我也发现自己很困惑为什么我们应该使用它。 Here's what I found:这是我发现的:

First of all, a trimmed mean is a robust estimator of central tendency.首先,修整均值是集中趋势的稳健估计量。 It's computation is quite simple since you only have to 1) remove a predetermined amount of observations on each side of a distribution and then 2) average the remaining observations.它的计算非常简单,因为您只需要 1) 在分布的每一侧删除预定数量的观测值,然后 2) 平均剩余的观测值。 In this way, by getting rid of some observation at each side of an asymmetric distribution, the trimmed mean estimation of the bulk of the observations is quite better and its standard error is less affected by outliers (in contrast with the 'traditional' mean).通过这种方式,通过去除不对称分布每一侧的一些观察,对大部分观察的修剪平均估计相当好,并且其标准误差受异常值的影响较小(与“传统”均值相比) .

Let's see the Datacamp example you provided:让我们看看您提供的 Datacamp 示例:

linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)

If you add them如果你添加它们

link_and_fb <- linkedin+facebook

#You get
> link_and_fb
[1] 33 16 18 21 10 30 28

Now remember that you wanted a 0.2 trimmed mean.现在请记住,您想要一个 0.2 的修剪平均值。 Before doing that R sorts your vector在这样做之前,R 对你的向量进行排序

sorted <- sort(link_and_fb)
> sorted
[1] 10 16 18 21 28 30 33

Given that you have 7 observations (0.2*7 = 1.4), you will remove 1.4 observations from each side of the distribution.鉴于您有 7 个观测值 (0.2*7 = 1.4),您将从分布的每一侧删除 1.4 个观测值。 Thus, you'll get rid of 10 and 33, and then divide the sum of the remaining observations by 5因此,您将去掉 10 和 33,然后将剩余观测值的总和除以 5

(16+18+21+28+30)/5 = 22.6

#Which is what you get with
mean(c(linkedin+facebook), trim = 0.2)
[1] 22.6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM