简体   繁体   中英

How to use a for loop to use ddply on multiple columns?

I have found a few stackoverflow questions very similar but the answers are not what I am looking for ( Loop through columns and apply ddply , Aggregate / summarize multiple variables per group (ie sum, mean, etc) )

The main difference is the answers simplify their problems in a way that does not use a for loop (nor apply) but uses aggregate (or similar) instead. However I have a large chunk of code working smoothly to do various summaries, stats, and plots, so what I would really like to do is get a loop or function working. The issue I am currently facing is going from the column name stored as q in the loop to the actual column (get() is not working for me). See below.

My data set is similar to below but with 40 features:

Subject <- c(rep(1, times = 6), rep(2, times = 6))
GroupOfInterest <- c(letters[rep(1:3, times = 4)])
Feature1 <- sample(1:20, 12, replace = T)
Feature2 <- sample(400:500, 12, replace = T)
Feature3 <- sample(1:5, 12, replace = T)
df.main <- data.frame(Subject,GroupOfInterest, Feature1, Feature2, 
Feature3, stringsAsFactors = FALSE)

在此处输入图片说明

My attempts so far have used a for loop:

Feat <- c(colnames(df.main[3:5]))    
for (q in Feat){
df_sum = ddply(df.main, ~GroupOfInterest + Subject,
            summarise, q =mean(get(q)))
  }

Which I hope to provide an output like below (although I realize the way it is now a separate merge function would be needed) :

在此处输入图片说明

However depending on how I do it I either get an error ("Error in get(q) : invalid first argument") or it averages all values of a feature rather than grouping by Subject and GroupOfInterest.

I have also tried using lists and lapply but am running into similar difficulties.

From what I have gathered my issue lies in that ddply is expecting Feature1. But if I loop through I am either providing it with "Feature1" (string) or (1,14,14,16,17...) which no longer is part of the dataframe which is needed to group by the Subject and Group.

Thanks so much for any help you can offer with solving this problem and teaching me how this process works.

Edited based on comment; need to include as.character(.)

Could you use summarise_at ? And helper functions vars(contains(...)) ?

df.main %>% 
    group_by(Subject, GroupOfInterest) %>% 
    summarise_at(vars(contains("Feature")), funs(mean(as.numeric(as.character(.)))))

the dlyr solution is given above, but to be fair here is the data.table one

DT <- setDT(df.main)
DT[,lapply(.SD,function(x){mean(as.numeric(as.character(x)))}),
.SDcols = names(DT)[grepl("Feature",names(DT))], by = .(Subject,GroupOfInterest)]

   Subject GroupOfInterest Feature1 Feature2 Feature3
1:       1               a      6.5    459.5      2.0
2:       1               b     11.0    480.5      4.0
3:       1               c      9.5    453.0      4.5
4:       2               a      3.5    483.0      1.5
5:       2               b      8.0    449.0      3.5
6:       2               c     11.5    424.0      1.0

OP mentioned to use simple for-loop for this transformation on data. I understand that there are many other optimized way to solve this but in order to respect OP desired I tried using for-loop based solution. I have used dplyr as plyr is old now.

library(dplyr)
Subject <- c(rep(1, times = 6), rep(2, times = 6))
GroupOfInterest <- c(letters[rep(1:3, times = 4)])
Feature1 <- sample(1:20, 12, replace = T)
Feature2 <- sample(400:500, 12, replace = T)
Feature3 <- sample(1:5, 12, replace = T)
#small change in the way data.frame is created 
df.main <- data.frame(Subject,GroupOfInterest, Feature1, Feature2, 
 Feature3, stringsAsFactors = FALSE)

Feat <- c(colnames(df.main[3:5])) 

# Ready with Key columns on which grouping is done
resultdf <- unique(select(df.main, Subject, GroupOfInterest))
#> resultdf
#  Subject GroupOfInterest
#1       1               a
#2       1               b
#3       1               c
#7       2               a
#8       2               b
#9       2               c


#For loop for each column
for(q in Feat){
  summean <- paste0('mean(', q, ')')
  summ_name <- paste0(q) #Name of the column to store sum
  df_sum <- df.main %>% 
     group_by(Subject, GroupOfInterest) %>%
    summarise_(.dots = setNames(summean, summ_name)) 
  #merge the result of new sum column in resultdf
  resultdf <- merge(resultdf, df_sum, by = c("Subject", "GroupOfInterest"))
}

# Final result
#> resultdf
#  Subject GroupOfInterest Feature1 Feature2 Feature3
#1       1               a      6.5    473.0      3.5
#2       1               b      4.5    437.0      2.0
#3       1               c     12.0    415.5      3.5
#4       2               a     10.0    437.5      3.0
#5       2               b      3.0    447.0      4.5
#6       2               c      6.0    462.0      2.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM