简体   繁体   English

在 R 中使用 pwr() 包的 A/B 测试持续时间和样本大小计算器

[英]A/B Test Duration & Sample Size Calculator Using pwr() Package in R

I am using the googleAnalyticsR package in R to pull some website visit stats and calculate conversion rates.我在 R 中使用 googleAnalyticsR 包来提取一些网站访问统计数据并计算转化率。 No problem thus far.到目前为止没有问题。

However, I have become stuck when trying to calculate the required population size and test duration using a package called pwr which I've modified based on some recommendations I found from another user online.但是,在尝试使用名为 pwr 的包计算所需的人口规模和测试持续时间时,我遇到了困难,我根据从其他用户在线找到的一些建议对其进行了修改。 Code below.代码如下。

average_daily_traffic <-  10.63 #cvr$all_users/30
control <- 0.30721 #cvr$cvr_perc
uplift <- 0.01

sample_size_calculator <- function(control, uplift){
  variant <- (uplift + 1) * control
  baseline <- ES.h(control, variant)
  sample_size_output <- pwr.p.test(h = baseline,
                                   n = ,
                                   sig.level = 0.05,
                                   power = 0.8)
  if(variant >= 0)
  {return(sample_size_output)}
  else
  {paste("N/A")}
}


duration_calculator <- function(sample_size_output, average_daily_traffic){
  days_required <- c((sample_size_output)*2)/(average_daily_traffic)
  if(days_required >= 0)
  {paste0("It will take approximately ", round(days_required, digits = 0), " days or ", round(round(days_required, digits = 0)/365, digits = 0) ," years for this test to reach significance, based on average traffic in the last 30 days")}
  else
  {paste("N/A")}
}


sample_size_calculator <- sample_size_calculator(control, uplift)
sample_size_output <-   sample_size_calculator$n
sample_size_output

duration_calculator(sample_size_output, average_daily_traffic)

The recommendation I saw online was to create 2 functions.我在网上看到的建议是创建 2 个函数。 One called 'sample_size_calculator' and the other called 'days_calculator', with both being fairly self-explanatory.一个称为“sample_size_calculator”,另一个称为“days_calculator”,两者都是不言自明的。 It is at least clear to me what the intended function of both is.至少我很清楚两者的预期功能是什么。

My output is thus:我的输出是这样的:

[1] "It will take approximately 33394 days or 91 years for this test to reach significance, based on average traffic in the last 30 days"

This seemed fairly realistic to me until I tried to verify my results using a couple of other online tools including VWO , Unbounce and AB Tasty , all of which suggest that I'm ~0.5x away from where I should be in terms of the number of days required to run the test.这对我来说似乎相当现实,直到我尝试使用其他一些在线工具(包括VWOUnbounceAB Tasty )验证我的结果,所有这些都表明我距离我应该在数量上的位置有 0.5 倍的距离运行测试所需的天数。 I appreciate some of the variance between the aforementioned calculators will be due to how each of the formulae handle rounding, but I'm more concerned with why and where I've gone wrong in my calculations such as to underestimate the test duration by half.我很欣赏上述计算器之间的一些差异将是由于每个公式如何处理舍入,但我更关心我在计算中出错的原因和位置,例如将测试持续时间低估了一半。

I could simply multiply the resultant number by two and go to bed but I'm keen to understand my error or even learn a more statistically and syntactically elegant way of coding this.我可以简单地将结果数乘以 2 并上床睡觉,但我很想了解我的错误,甚至学习一种更统计和语法上更优雅的编码方式。

Thanks in advance.提前致谢。

Personally, I'd suggest to try to simulate data instead of relying on pre-packaged power calculations.就个人而言,我建议尝试模拟数据,而不是依赖预先打包的功率计算。 You seem to have a good grasp of writing functions in R, so it won't be much of a step-up for you to simulate data using iteration (eg with for loops, or, what I'd personally recommend more, vectorized iteration with purrr ).您似乎对在 R 中编写函数有很好的掌握,因此使用迭代(例如使用for循环,或者我个人更推荐的矢量化迭代)来模拟数据对您来说并不是一个很大的进步有purrr )。

The advantage of simulating data is that it forces you to think your model through in advance, which can be invaluable later on when you're modelling the real data.模拟数据的优势在于它迫使您提前考虑您的模型,这在您对真实数据建模时非常宝贵。

This is a great, if a little bit dated, tutorial: http://disjointedthinking.jeffhughes.ca/2017/09/power-simulations-r/这是一个很棒的教程,如果有点过时的话: http : //disjointedthinking.jeffhughes.ca/2017/09/power-simulations-r/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM