I am trying to analyse website data for AB testing. My reference point is based on experimentName = Experiment 1 (control version)
experimentName UniquePageView UniqueFrequency NonUniqueFrequency
1 Experiment 1 459 294 359
2 Experiment 2 440 286 338
3 Experiment 3 428 273 348
What I need to do is sum every UniquePageView, UniqueFrequency and NonUniqueFrequency row when experimentName = Experiment 1
eg
UniquePageView WHERE experimentName = 'Experiment 1 ' + UniquePageView WHERE experimentName = 'Experiment 2 ',
UniquePageView WHERE experimentName = 'Experiment 1 ' + UniquePageView WHERE experimentName = 'Experiment 3 '
so on so forth (I could have an unlimted number of experiment #) then do the same for UniqueFrequency and NonUniqueFrequency (I could have an unlimited number of column as well)
Result expected:
experimentName UniquePageView UniqueFrequency NonUniqueFrequency Conversion Rate Pooled UniquePageView Conversion Rate Pooled UniqueFrequency Conversion Rate Pooled NonUniqueFrequency
1 Experiment 1 459 294 359 918 588 718
2 Experiment 2 440 286 338 899 580 697
3 Experiment 3 428 273 348 887 567 707
here is the math behind it:
experimentName UniquePageView UniqueFrequency NonUniqueFrequency Conversion Rate Pooled UniquePageView Conversion Rate Pooled UniqueFrequency Conversion Rate Pooled NonUniqueFrequency
1 Experiment 1 459 294 359 459 + 459 294 + 294 359 + 359
2 Experiment 2 440 286 338 459 + 440 294 + 286 359 + 338
3 Experiment 3 428 273 348 459 + 428 294 + 273 359 + 348
In base R, you can do this in one line by column binding (with cbind
) the initial data frame to the initial data frame plus a version that is just duplicates of the "Experiment 1" row).
cbind(dat, dat[,-1] + dat[rep(which(dat$experimentName == "Experiment 1"), nrow(dat)), -1])
# experimentName UniquePageView UniqueFrequency NonUniqueFrequency UniquePageView UniqueFrequency
# 1 Experiment 1 459 294 359 918 588
# 2 Experiment 2 440 286 338 899 580
# 3 Experiment 3 428 273 348 887 567
# NonUniqueFrequency
# 1 718
# 2 697
# 3 707
To update the column names at the end (assuming you stored the resulting data frame in res
), you could use:
names(res)[4:6] <- c("CombinedPageView", "CombinedUniqueFrequency", "CombinedNonUniqueFrequency")
Do you know how to use dplyr? If you're new to R, this is a pretty good lesson to learn. Dplyr includes the functions filter
and summarise
, which are all you need to do this problem - very simple!
First, take your data frame
df
Then, filter to only the data you want, in this case when experimentName = Experiment 1
df
df <- filter(df, experimentName == "Experiment 1")
Now, summarise to find the sums of UniquePageView, UniqueFrequency and NonUniqueFrequency
df
df <- filter(df, experimentName == "Experiment 1")
summarise(df, SumUniquePageView = sum(UniquePageView),
SumUniqueFrequency = sum(UniqueFrequency),
SumNonUniqueFrequency = sum(NonUniqueFrequency))
This will return a small table with the answers you're looking for. For a slightly more advanced (but simpler) way to do this, you can use the piping operator %>%
from the packages magrittr. That code borrows the object from the previous statement and uses it as the first argument in the proceeding statement, as follows:
df %>% filter(experimentName == "Experiment 1") %>% summarise(SumUniquePageView = sum(UniquePageView), etc)
If you don't yet have those packages, you can get them with install.packages("dpyr")
, library(dplyr)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.