简体   繁体   中英

rename column in dataframe using variable name R

I have a number of data frames. Each with the same format. Like this:

           A           B          C
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.0234436

I would like to change the name of the third column--C--so that it includes part if the name of the variable name associated with the data frame.

For the variable df_elephant the data frame should look like this:

     A           B          C.elephant
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.0234436

I have a function which will change the column name:

rename_columns <- function(x) {

  colnames(x)[colnames(x)=='C'] <-
    paste( 'C',
           strsplit (deparse (substitute(x)), '_')[[1]][2], sep='.' ) 
  return(x)
}

This works with my data frames. However, I would like to provide a list of data frames so that I do not have to call the function multiple times by hand. If I use lapply like so:

lapply( list (df_elephant, df_horse), rename_columns )

The function renames the data frames with an NA rather than portion of the variable name.

[[1]]
         A            B       C.NA
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.02344361

[[2]]
         A            B       C.NA
1   0.45387054  0.02279488  1.6746280
2  -1.47271378  0.68660595 -0.2505752
3   1.26475917 -1.51739927 -1.3050531

Is there some way that I kind provide a list of data frames to my function and produce the desired result?

You are trying to process the data frame column names instead of the actual lists' name. And this is why it's not working.

# Generating random data
n = 3
item1 = data.frame(A = runif(n), B = runif(n), C = runif(n))
item2 = data.frame(A = runif(n), B = runif(n), C = runif(n))
myList = list(df_elephant = item1,  df_horse = item2)


# 1- Why your code doesnt work: ---------------
names(myList) # This will return the actual names that you want to use : [1] "df_elephant" "df_horse"   
lapply(myList, names) # This will return the dataframes' column names. And thats why you are getting the "NA"


# 2- How to make it work: ---------------
lapply(seq_along(myList), # This will return an array of indicies  

       function(i){
         dfName = names(myList)[i] # Get the list name
         dfName.animal = unlist(strsplit(dfName, "_"))[2] # Split on underscore and take the second element

         df = myList[[i]] # Copy the actual Data frame 
         colnames(df)[colnames(df) == "C"] = paste("C", dfName.animal, sep = ".") # Change column names

         return(df) # Return the new df 
       })


# [[1]]
# A          B C.elephant
# 1 0.8289368 0.06589051  0.2929881
# 2 0.2362753 0.55689663  0.4854670
# 3 0.7264990 0.68069346  0.2940342
# 
# [[2]]
# A         B   C.horse
# 1 0.08032856 0.4137106 0.6378605
# 2 0.35671556 0.8112511 0.4321704
# 3 0.07306260 0.6850093 0.2510791

We can try with Map . Get the datasets in a list (here we used mget to return the values of the strings in a list ), using Map , we change the names of the third column with that of the corresponding vector of names .

 Map(function(x, y) {names(x)[3] <- paste(names(x)[3], sub(".*_", "", y), sep="."); x},  
     mget(c("df_elephant", "df_horse")), c("df_elephant", "df_horse"))
#$df_elephant
#            A          B  C.elephant
#1 -0.02299388  0.7140416   0.8492423
#2 -1.43027866 -1.9642077  -1.2886368
#3 -1.01827712 -0.9414119  -2.0234436

#$df_horse
#           A           B   C.horse
#1  0.4538705  0.02279488  1.6746280
#2 -1.4727138  0.68660595 -0.2505752
#3  1.2647592 -1.51739927 -1.3050531

You can also try. Somehow similar to Akrun's answer using also Map in the end:

# Your data
d <- read.table("clipboard")
# create a list with names A and B
d_list <- list(A=d, B=d)

# function
foo <- function(x, y){
  gr <- which(colnames(x) == "C") # get index of colnames C 
  tmp <- colnames(x) #new colnames vector
  tmp[gr] <- paste(tmp[gr], y, sep=".") # replace the old with the new colnames.
  setNames(x, tmp) # set the new names
}
# Result
Map(foo, d_list, names(d_list))
$A
            A          B        C.A
1 -0.02299388  0.7140416  0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436

$B
            A          B        C.B
1 -0.02299388  0.7140416  0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM