简体   繁体   中英

R summing row one with all rows

I am trying to analyse website data for AB testing. My reference point is based on experimentName = Experiment 1 (control version)

  experimentName UniquePageView UniqueFrequency NonUniqueFrequency
1   Experiment 1            459             294                359
2   Experiment 2            440             286                338
3   Experiment 3            428             273                348

What I need to do is sum every UniquePageView, UniqueFrequency and NonUniqueFrequency row when experimentName = Experiment 1

eg

UniquePageView WHERE experimentName = 'Experiment 1 ' +  UniquePageView WHERE experimentName = 'Experiment 2 ',
 UniquePageView WHERE experimentName = 'Experiment 1 ' +  UniquePageView WHERE experimentName = 'Experiment 3 '

so on so forth (I could have an unlimted number of experiment #) then do the same for UniqueFrequency and NonUniqueFrequency (I could have an unlimited number of column as well)

Result expected:

experimentName  UniquePageView  UniqueFrequency NonUniqueFrequency  Conversion Rate Pooled UniquePageView   Conversion Rate Pooled UniqueFrequency  Conversion Rate Pooled NonUniqueFrequency
1   Experiment 1    459 294 359 918 588 718
2   Experiment 2    440 286 338 899 580 697
3   Experiment 3    428 273 348 887 567 707

here is the math behind it:

    experimentName  UniquePageView  UniqueFrequency NonUniqueFrequency       Conversion Rate Pooled UniquePageView  Conversion Rate Pooled UniqueFrequency  Conversion Rate Pooled NonUniqueFrequency
1   Experiment 1    459 294 359 459 + 459   294 + 294   359 + 359
2   Experiment 2    440 286 338 459 + 440   294 + 286   359 + 338
3   Experiment 3    428 273 348 459 + 428   294 + 273   359 + 348

In base R, you can do this in one line by column binding (with cbind ) the initial data frame to the initial data frame plus a version that is just duplicates of the "Experiment 1" row).

cbind(dat, dat[,-1] + dat[rep(which(dat$experimentName == "Experiment 1"), nrow(dat)), -1])
#   experimentName UniquePageView UniqueFrequency NonUniqueFrequency UniquePageView UniqueFrequency
# 1   Experiment 1            459             294                359            918             588
# 2   Experiment 2            440             286                338            899             580
# 3   Experiment 3            428             273                348            887             567
#   NonUniqueFrequency
# 1                718
# 2                697
# 3                707

To update the column names at the end (assuming you stored the resulting data frame in res ), you could use:

names(res)[4:6] <- c("CombinedPageView", "CombinedUniqueFrequency", "CombinedNonUniqueFrequency")

Do you know how to use dplyr? If you're new to R, this is a pretty good lesson to learn. Dplyr includes the functions filter and summarise , which are all you need to do this problem - very simple!

First, take your data frame

df

Then, filter to only the data you want, in this case when experimentName = Experiment 1

df
df <- filter(df, experimentName == "Experiment 1")

Now, summarise to find the sums of UniquePageView, UniqueFrequency and NonUniqueFrequency

df
df <- filter(df, experimentName == "Experiment 1")
summarise(df, SumUniquePageView = sum(UniquePageView),
              SumUniqueFrequency = sum(UniqueFrequency),
              SumNonUniqueFrequency = sum(NonUniqueFrequency))

This will return a small table with the answers you're looking for. For a slightly more advanced (but simpler) way to do this, you can use the piping operator %>% from the packages magrittr. That code borrows the object from the previous statement and uses it as the first argument in the proceeding statement, as follows:

df %>% filter(experimentName == "Experiment 1") %>% summarise(SumUniquePageView = sum(UniquePageView), etc)

If you don't yet have those packages, you can get them with install.packages("dpyr") , library(dplyr)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM