[英]Mean and SD in R
maybe it is a very easy question. 也许这是一个非常简单的问题。 This is my data.frame: 这是我的data.frame:
> read.table("text.txt")
V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860
It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. 这意味着我有22516个序列,长度为26,17129个序列,长度为28,等等。我想知道序列长度均值及其标准偏差。 I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. 我知道怎么做,但我知道要创建一个完整的26个重复22516次的列表等等......然后计算平均值和SD。 However, I thing there is a easier method. 但是,我有一个更简单的方法。 Any idea? 任何想法?
Thanks. 谢谢。
For mean: (V1 %*% V2)/sum(V2)
平均值: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
对于SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
I do not find mean(rep(V1,V2)) # 61.902
and sd(rep(V1,V2)) # 14.23891
that complex, but alternatively you might try: 我没有找到复杂的mean(rep(V1,V2)) # 61.902
和sd(rep(V1,V2)) # 14.23891
,但是你可能会尝试:
weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891
Step1: Set up data: 第1步:设置数据:
dat.df <- read.table(text="id V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860",header=T)
Step2: Convert to data.table
(only for simplicity and laziness in typing) 第2 data.table
:转换为data.table
(仅为了简单和输入时的懒惰)
library(data.table)
dat <- data.table(dat.df)
Step3: Set up new columns with products, and use them to find mean 第3步:使用产品设置新列,并使用它们来查找平均值
dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]
dat.Mean <- sum(dat$pr)/sum(dat$V2)
dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)
Hope this helps!! 希望这可以帮助!!
MEAN = (V1*V2)/sum(V2) MEAN =(V1 * V2)/ sum(V2)
SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2) SD = sqrt((V1 * V1 * V2)/ sum(V2) - MEAN ^ 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.