简体   繁体   中英

apply function to column names using a list of data frames

I'm trying to apply a very complex function to a list of more than 50 Data Frames. Let's use a very simple function to lowercase names and just 3 data frames for the sake of clarity, but my general approach is coded below

[EDITED NAMES]
# Data Sample. Every column name is different accross Data Frames


quality <- data.frame(FIRST=c(1,5,3,3,2), SECOND=c(3,6,1,5,5))
thickness <- data.frame(THIRD=c(6,0,9,1,2), FOURTH=c(2,7,2,2,1))
distance <- data.frame(ONEMORE=c(0,0,1,5,1), ANOTHER=c(4,1,9,2,3))


# list of dataframes

dfs <- list(quality, thickness, distance)


# a very simple function (just for testing)
# actually a very complex one is used on real data

BetterNames <- function(x) {
    names(x) <- tolower(names(x))
  x
}


# apply function to data frame list

dfs <- lapply(dfs, BetterNames)

# I know the expected R behaviour is to modify a copy of the object,
# instead of the original object itself. So if you get the names
# you get the original version, not the needed one

names(quality)

[1] "FIRST"  "SECOND"

is there any way of using any function inside a loop or "apply" in place for a huge amount of data frames?
As a result we must get the modified one replacing the original one for every data frame in the list (big list)

I know there's a trick using Data Table, but I wonder if using base R is that possible.

Expected Results:

 names(quality)

    [1] "first"  "second"

[EDITED] Pointed out to this answer: Rename columns in multiple dataframes, R

But not working. You can't use a vector of string names in my case because my new names are not a fixed list of strings.[EDITED DATA]

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- BetterNames(df)
  assign(df, df.tmp)
}

> names(quality)
[1] "quality" NA  

Thanks

You already have the best case scenario:

Let's add some names to your list:

names(dfs) <- c("quality", "thickness", "distance")
dfs <- lapply(dfs, BetterNames)

dfs[["quality"]]
#   first second
# 1     1      3
# 2     5      6
# 3     3      1
# 4     3      5
# 5     2      5

This works great. And all your data is in a list, so if there are other things you want to do to all your data frames it is very easy.

If you are done treating these data frames similarly and really want them back in the global environment to work with individually, you can do it with

list2env(dfs, envir = .GlobalEnv)

I would recommend keeping them in a list though---in most cases if you have 50 data frames you are working with, in a list it is easy to use lapply or for loops to use them, but as individual objects you will be copy/pasting code and making mistakes.


I would consider even starting with 50 data frames in your workspace a problem - see How do I make a list of data frames? for recommendations on finding an upstream fix: going straight to a list from the start.

i'd use a simple yet effective parse & eval approach.

Let's use a for loop to compose a command that suited your needs:

for(df in dfs) {

command <- paste0("names(",df,") <- BetterNames(",df,")")
# print(command)
eval(parse(text=command))

}

names(quality)
[1] "first"  "second"

names(thickness)
[1] "third"  "fourth"

names(distance)
[1] "onemore"  "another"

This is for sure not optimal and I hope something better comes up but here it goes:

BetterNames <- function(x, y) {

    names(x) <- tolower(names(x))
    assign(y, x, envir = .GlobalEnv)

}

dfs <- list(quality, thickness, distance)
dfs2 <- c("quality", "thickness", "distance")
mapply(BetterNames, dfs, dfs2)

> names(quality)
[1] "first"  "second"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM