R 计算均值、中值、方差与频率分布的文件

Question

I am very new to R tool and my questions might be a little too obvious.我对 R 工具很陌生，我的问题可能有点太明显了。

I have a file that has the following data:我有一个包含以下数据的文件：

Score     Frequency

 100         10

 200         30

 300         40

How do I read this file and compute the mean, median, variance and standard deviation?如何读取此文件并计算均值、中值、方差和标准差？

If this above table was just raw scores without any frequency information, then I can do this:如果上表只是没有任何频率信息的原始分数，那么我可以这样做：

x <- scan(file="scores.txt", what = integer()) x <- scan(file="scores.txt", what = integer())

median(x)中位数(x)

and so on, but I am not able to understand how to do these computations when given a frequency table.等等，但是当给定频率表时，我无法理解如何进行这些计算。

Answer 1

Read the data with read.table (read ?read.table for reading from a file).使用read.table读取数据（read ?read.table用于从文件中读取）。 Then, expand the data by creating a vector of individual scores.然后，通过创建单个分数的向量来扩展数据。 We can then write a function to get the desired statistics.然后我们可以编写一个函数来获取所需的统计信息。 You can, of course, calculate each individually if you don't wish to write a function.当然，如果您不想编写函数，您可以单独计算每个。

d <- read.table(header = TRUE, text = "Score     Frequency
 100         10
 200         30
 300         40")

d2 <- rep(d$Score, d$Frequency)  ## expands the data by frequency of score

multi.fun <- function(x) {
    c(mean = mean(x), median = median(x), var = var(x), sd = sd(x))
}

multi.fun(d2)
#      mean     median        var         sd 
# 237.50000  250.00000 4905.06329   70.03616

Answer 2

Depending on what format you input file is in you can use read.csv("scores.txt") .根据您输入文件的格式，您可以使用read.csv("scores.txt") 。 You can change the separator with read.csv("scores.txt", sep="\\t") .您可以使用read.csv("scores.txt", sep="\\t")更改分隔符。 If you data doesn't have a header, you can use the option header=F .如果您的数据没有标题，则可以使用选项header=F 。

I am going to use a , since it is easier to read here.我将使用 a ,因为这里更容易阅读。

INPUT FILE输入文件

Score,Frequency
100,10
200,30
300,40

R Source Code R 源代码

x <- read.csv("scores.txt")
mean(x$Score)
median(x$Score)
var(x$Score)
mean(x$Score)
sd(x$Score)

R Output R 输出

> mean(x$Score)
[1] 200
> median(x$Score)
[1] 200
> var(x$Score)
[1] 10000
> mean(x$Score)
[1] 200
> sd(x$Score)
[1] 100

If you want to include the frequency.如果你想包括频率。

R Source Code R 源代码

x <- read.csv("scores.txt")
mean(rep(x$Score, x$Frequency))
median(rep(x$Score, x$Frequency))
var(rep(x$Score, x$Frequency))
mean(rep(x$Score, x$Frequency))
sd(rep(x$Score, x$Frequency))

R Output R 输出

> mean(rep(x$Score, x$Frequency))
[1] 237.5
> x <- read.csv("scores.txt")
> mean(rep(x$Score, x$Frequency))
[1] 237.5
> median(rep(x$Score, x$Frequency))
[1] 250
> var(rep(x$Score, x$Frequency))
[1] 4905.063
> mean(rep(x$Score, x$Frequency))
[1] 237.5
> sd(rep(x$Score, x$Frequency))
[1] 70.03616

Answer 3

Just do it the way you would have done it manually:只需按照您手动完成的方式进行操作：

Let s be the vector of scores and f the vector of frequencies.让 s 是分数向量，f 是频率向量。

Sx = sum(s*f)
Sx2 = sum((s^2)*f)
n = sum(f)
theMean = Sx/n
SSx = Sx2 - n*theMean^2
sVar = SSx/(n-1)
ssd = sqrt(sVar)

This avoids the use of rep, which is a hassle when numbers are large.这避免了使用 rep，当数字很大时，这很麻烦。

Answer 4

lines <- readLines("scores.txt")[-1]
mat <- matrix(as.numeric(unlist(
    strsplit(gsub(".*(\\d+).*(\\d+).*", "\\1,\\2", lines), ","))),
  ncol = 2, byrow = TRUE)
print(summary(mat[, 1]))
print(summary(mat[, 2]))

R 计算均值、中值、方差与频率分布的文件

问题描述

4 个解决方案

解决方案1
6 2014-03-25 19:53:18

解决方案2
4 2014-03-25 19:52:58

解决方案3
3 2020-01-15 15:36:34

解决方案4
0 2014-03-25 19:36:03

R 计算均值、中值、方差与频率分布的文件

问题描述

4 个解决方案

解决方案1 6 2014-03-25 19:53:18

解决方案2 4 2014-03-25 19:52:58

解决方案3 3 2020-01-15 15:36:34

解决方案4 0 2014-03-25 19:36:03

解决方案1
6 2014-03-25 19:53:18

解决方案2
4 2014-03-25 19:52:58

解决方案3
3 2020-01-15 15:36:34

解决方案4
0 2014-03-25 19:36:03