[英]R summing row one with all rows
I am trying to analyse website data for AB testing. 我正在尝试分析网站数据以进行AB测试。 My reference point is based on experimentName = Experiment 1 (control version)
我的参考点基于experimentName =实验1(控制版本)
experimentName UniquePageView UniqueFrequency NonUniqueFrequency
1 Experiment 1 459 294 359
2 Experiment 2 440 286 338
3 Experiment 3 428 273 348
What I need to do is sum every UniquePageView, UniqueFrequency and NonUniqueFrequency row when experimentName = Experiment 1 我需要做的是,当experimentName =实验1时,对每个UniquePageView,UniqueFrequency和NonUniqueFrequency行求和
eg 例如
UniquePageView WHERE experimentName = 'Experiment 1 ' + UniquePageView WHERE experimentName = 'Experiment 2 ',
UniquePageView WHERE experimentName = 'Experiment 1 ' + UniquePageView WHERE experimentName = 'Experiment 3 '
so on so forth (I could have an unlimted number of experiment #) then do the same for UniqueFrequency and NonUniqueFrequency (I could have an unlimited number of column as well) 依此类推(我可以有无限次的实验编号),然后对UniqueFrequency和NonUniqueFrequency做同样的事情(我也可以有无限数量的列)
Result expected: 预期结果:
experimentName UniquePageView UniqueFrequency NonUniqueFrequency Conversion Rate Pooled UniquePageView Conversion Rate Pooled UniqueFrequency Conversion Rate Pooled NonUniqueFrequency
1 Experiment 1 459 294 359 918 588 718
2 Experiment 2 440 286 338 899 580 697
3 Experiment 3 428 273 348 887 567 707
here is the math behind it: 这是其背后的数学公式:
experimentName UniquePageView UniqueFrequency NonUniqueFrequency Conversion Rate Pooled UniquePageView Conversion Rate Pooled UniqueFrequency Conversion Rate Pooled NonUniqueFrequency
1 Experiment 1 459 294 359 459 + 459 294 + 294 359 + 359
2 Experiment 2 440 286 338 459 + 440 294 + 286 359 + 338
3 Experiment 3 428 273 348 459 + 428 294 + 273 359 + 348
In base R, you can do this in one line by column binding (with cbind
) the initial data frame to the initial data frame plus a version that is just duplicates of the "Experiment 1" row). 在基本R中,您可以通过将原始数据帧绑定到初始数据帧(以及仅与“实验1”行重复的版本)(使用
cbind
)进行列绑定来一行完成此操作。
cbind(dat, dat[,-1] + dat[rep(which(dat$experimentName == "Experiment 1"), nrow(dat)), -1])
# experimentName UniquePageView UniqueFrequency NonUniqueFrequency UniquePageView UniqueFrequency
# 1 Experiment 1 459 294 359 918 588
# 2 Experiment 2 440 286 338 899 580
# 3 Experiment 3 428 273 348 887 567
# NonUniqueFrequency
# 1 718
# 2 697
# 3 707
To update the column names at the end (assuming you stored the resulting data frame in res
), you could use: 要在末尾更新列名(假设您将结果数据帧存储在
res
),可以使用:
names(res)[4:6] <- c("CombinedPageView", "CombinedUniqueFrequency", "CombinedNonUniqueFrequency")
Do you know how to use dplyr? 您知道如何使用dplyr吗? If you're new to R, this is a pretty good lesson to learn.
如果您是R的新手,这是一个很好的课程。 Dplyr includes the functions
filter
and summarise
, which are all you need to do this problem - very simple! Dplyr包括功能
filter
和summarise
,这些都是你需要做的这个问题-很简单!
First, take your data frame 首先,以您的数据框
df
Then, filter to only the data you want, in this case when experimentName = Experiment 1 然后,仅过滤所需的数据,在这种情况下,当ExperimentName =实验1
df
df <- filter(df, experimentName == "Experiment 1")
Now, summarise to find the sums of UniquePageView, UniqueFrequency and NonUniqueFrequency 现在,进行汇总以找到UniquePageView,UniqueFrequency和NonUniqueFrequency的总和
df
df <- filter(df, experimentName == "Experiment 1")
summarise(df, SumUniquePageView = sum(UniquePageView),
SumUniqueFrequency = sum(UniqueFrequency),
SumNonUniqueFrequency = sum(NonUniqueFrequency))
This will return a small table with the answers you're looking for. 这将返回一个小表格,其中包含您要查找的答案。 For a slightly more advanced (but simpler) way to do this, you can use the piping operator
%>%
from the packages magrittr. 对于更高级(但更简单)的方法,可以使用magrittr软件包中的管道运算符
%>%
。 That code borrows the object from the previous statement and uses it as the first argument in the proceeding statement, as follows: 该代码从先前的语句中借用了该对象,并将其用作进行性语句中的第一个参数,如下所示:
df %>% filter(experimentName == "Experiment 1") %>% summarise(SumUniquePageView = sum(UniquePageView), etc)
If you don't yet have those packages, you can get them with install.packages("dpyr")
, library(dplyr)
如果您还没有那些软件包,可以通过
install.packages("dpyr")
, library(dplyr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.