简体   繁体   English

R 如何根据比例计算置信区间

[英]R how to calculate confidence interval based on proportion

I'm new to R and trying to learn stats.. Here is one practice question that I'm trying to figure out我是 R 的新手并试图学习统计数据。这是我试图弄清楚的一个练习问题点我 How should I use R code to create a function based on this math equation?我应该如何使用 R 代码根据这个数学方程创建 function?

I have a dataframe like this我有一个像这样的 dataframe 点我 the "exposed" column from the df contains two groups, one is called"Test Group (Exposed)" the other one is called "Control Group". df中的“暴露”列包含两组,一组称为“测试组(暴露)”,另一组称为“控制组”。 So the math function is referring to these two groups.所以数学 function 指的是这两组。

In another practice I have these codes here to calculate the confidence interval在另一种实践中,我在这里有这些代码来计算置信区间

# sample size
# OK for non normal data if n > 30
n <- 150

# calculate the mean & standard deviation
will_mean <- mean(will_sample)
will_s <- sd(will_sample)

# normal quantile function, assuming mean has a normal distribution:
qnorm(p=0.975, mean=0, sd=1) # 97.5th percentile for a N(0,1) distribution
# a.k.a. Z = 1.96 from the standard normal distribution

# calculate standard error of the mean
# standard error of the mean = mean +/- critical value x (s/sqrt(n))
# "q" functions in r give the value of the statistic at a given quantile
critical_value <- qt(p=0.975, df=n-1)
error <- critical_value * will_s/sqrt(n)

# confidence inverval 
will_mean - error
will_mean + error

but I'm not sure how to do the exposed 2 groups但我不确定如何处理暴露的 2 组

Don't worry it's quite easy if you have experience in at least one programming language, R is quite trivial.如果您有至少一种编程语言的经验,请不要担心这很容易,R 非常简单。 The only remarkable difference between R and most of other programming languanges is that R was developed for statistical purposes. R 和大多数其他编程语言之间唯一显着的区别是 R 是为统计目的而开发的。 You can compute what is the quantile for a certain significance level α (reminds to divide it by 2 for your formula) by using the function qnorm() .您可以使用 function qnorm() By default it is set up for standardized normal distribution, like in your case, but you can get more details using the documentation, reachable by the command ?qnorm() .默认情况下,它设置为标准化正态分布,就像您的情况一样,但您可以使用文档获取更多详细信息,可通过命令?qnorm()访问。 Actually in the exercise you are not required to compute it, since you have to pass it as argument, but in reality you need to.实际上在练习中你不需要计算它,因为你必须将它作为参数传递,但实际上你需要。 The code should be something like:代码应该是这样的:

conf <- function(p1,p2,n1,n2,z){
      part = z*(p1*(1-p1)/n1+p2*(1-p2)/n2)**(1/2)
      return(c(p1-p2-part,
               p1-p2+part))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM