简体   繁体   中英

Perform any task on a dataframe based on the column name, which is input by the user

A system deals with extracting all the column names of a data.frame that is given as an input and the user gets to chose either of the variables from the input dataset. Considering the mtcars dataset, if it is given as an input, the user can chose from its columns which are extracted as :

#to get all the column names and type
colNamesTypes<- as.data.frame(sapply(mtcars, typeof))
colNamesTypes<-cbind(Variable=rownames(colNamesTypes),colNamesTypes)
colnames(colNamesTypes)<-c("Variable","Type")
rownames(colNamesTypes)<-NULL

The column names are:

carnames mpg    cyl disp    hp  drat    wt  qsec    vs  am  gear    carb

(I converted the row.names of mtcars to a proper column - carnames for convenience sake)

Lets say the user selects, mpg and hp and wants to find the sum of those two variables. We can do it in the following way :

UserVar1 <- "mpg"
UserVar2 <- "hp"
summary1 = group_by(mtcars,mpg,hp)
summary1 = summarise(summary1, 
                      Sum_mpg = sum(mpg),
                      Sum_hp = sum(hp))

The above statements are sure enough to give the users their required analyses. But the problem here is that the group_by() and summarise() statements are not dynamic ie, if the user wants analyses for some other variables, then R cannot know the new variables that are selected.

So, how do I ask my summarise() to take UserVar1 and UserVar2 as the arguments, instead of the hard-coded column names?


I have tried using mtcars[UserVar1] which is analogous to mtcars["mpg"] but the output is in a data.frame form not a vector form like mtcars$mpg gives and hence I get an error in the summarise() statement.

summary1 = group_by(mtcars, v1 = unlist(mtcars[UserVar1]), 
                    v2 = unlist(mtcars[UserVar2]) )
summary1 = summarise(summary1, 
Sum_mpg = sum(v1),
Sum_hp = sum(v2) )

v1 and v2 are the names given to the vectors in the group_by function.

unlist(mtcars[UserVar1])

gives you a vector

summary1 = summarise(summary1, 
                      Sum_mpg = sum(summary1[,UserVar1]),
                      Sum_hp = sum(summary1[,UserVar2]))

Gives the same output as providing the unquoted column names.

I prefer this piping method though:

mtcars %>% 
  group_by(mpg,hp) %>% 
  summarise(Sum_mpg = sum(.[,UserVar1]),
            Sum_hp = sum(.[,UserVar2]))->summary1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM