简体   繁体   English

R中的平均值和SD

[英]Mean and SD in R

maybe it is a very easy question. 也许这是一个非常简单的问题。 This is my data.frame: 这是我的data.frame:

> read.table("text.txt")
   V1       V2
1  26    22516
2  28    17129
3  30    38470
4  32    12920
5  34    30835
6  36    36244
7  38    24482
8  40    67482
9  42    23121
10 44    51643
11 46    61064
12 48    37678
13 50    98817
14 52    31741
15 54    74672
16 56    85648
17 58    53813
18 60   135534
19 62    46621
20 64    89266
21 66    99818
22 68    60071
23 70   168558
24 72    67059
25 74   194730
26 76   278473
27 78   217860

It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. 这意味着我有22516个序列,长度为26,17129个序列,长度为28,等等。我想知道序列长度均值及其标准偏差。 I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. 我知道怎么做,但我知道要创建一个完整的26个重复22516次的列表等等......然后计算平均值和SD。 However, I thing there is a easier method. 但是,我有一个更简单的方法。 Any idea? 任何想法?

Thanks. 谢谢。

For mean: (V1 %*% V2)/sum(V2) 平均值: (V1 %*% V2)/sum(V2)

For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2)) 对于SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))

I do not find mean(rep(V1,V2)) # 61.902 and sd(rep(V1,V2)) # 14.23891 that complex, but alternatively you might try: 我没有找到复杂的mean(rep(V1,V2)) # 61.902sd(rep(V1,V2)) # 14.23891 ,但是你可能会尝试:

weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891

Step1: Set up data: 第1步:设置数据:

dat.df <- read.table(text="id   V1       V2
1  26    22516
2  28    17129
                  3  30    38470
                  4  32    12920
                  5  34    30835
                  6  36    36244
                  7  38    24482
                  8  40    67482
                  9  42    23121
                  10 44    51643
                  11 46    61064
                  12 48    37678
                  13 50    98817
                  14 52    31741
                  15 54    74672
                  16 56    85648
                  17 58    53813
                  18 60   135534
                  19 62    46621
                  20 64    89266
                  21 66    99818
                  22 68    60071
                  23 70   168558
                  24 72    67059
                  25 74   194730
                  26 76   278473
                  27 78   217860",header=T)

Step2: Convert to data.table (only for simplicity and laziness in typing) 第2 data.table :转换为data.table (仅为了简单和输入时的懒惰)

library(data.table)
dat <- data.table(dat.df)

Step3: Set up new columns with products, and use them to find mean 第3步:使用产品设置新列,并使用它们来查找平均值

dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]

dat.Mean <- sum(dat$pr)/sum(dat$V2)

dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)

Hope this helps!! 希望这可以帮助!!

MEAN = (V1*V2)/sum(V2) MEAN =(V1 * V2)/ sum(V2)

SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2) SD = sqrt((V1 * V1 * V2)/ sum(V2) - MEAN ^ 2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM