简体   繁体   中英

Using Strings to Identify Sequence of Column Names in R

I am currently try to use pre-defined strings in order to identify multiple column names in R. To be more explicit, I am using the ave function to create identification variables for subgroups of a dataframe. The twist is that I want the identification variables to be flexible, in such a manner that I would just pass it as a generic string.

A sample code would be:

ids = with(df,ave(rep(1,nrow(df)),subcolumn1,subcolumn2,subcolumn3,FUN=seq_along))

I would like to run this code in the following fashion (code below does not work as expected):

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),subColumnsString ,FUN=seq_along))

I tried something with eval, but still did not work:

subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),eval(parse(text=subColumnsString)),FUN=seq_along))

Any ideas? Thanks.

EDIT: Working code example of what I want:

df = mtcars
id_names = c("vs","am")
idDF_correct = transform(df,idItem = as.numeric(interaction(vs,am)))
idDF_wrong = cbind(df,ave(rep(1,nrow(df)),df[id_names],FUN=seq_along))

Note how in idDF_correct, the unique combinations are correctly mapped into unique values of idItem. In idDF_wrong this is not the case.

I think this achieves what you requested. Here I use the mtcars dataset that ships with R:

subColumnsString <- c("cyl","gear")

ids = with(mtcars, ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along))

Just index your data.frame using the sub columns which returns a list that naturally works with ave

EDIT

ids = ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along)

You can omit the with and just call plain 'ol ave , as G. Grothendieck, stated and you should also use their answer as it is much more general.

This defines a function whose arguments are:

  • data , the input data frame
  • by , a character vector of column names in data
  • fun , a function to use in ave

Code--

Ave <- function(data, by, fun = seq_along) {
   do.call(function(...) ave(rep(1, nrow(data)), ..., FUN = fun), data[by])
}

# test 
Ave(CO2, c("Plant", "Treatment"), seq_along)

giving:

 [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
[39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6
[77] 7 1 2 3 4 5 6 7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM