简体   繁体   中英

Avoiding and renaming .x and .y columns when merging or joining in r

Often I go about joining two dataframes together that have the same name. Is there a way to do this within the join-step so that I don't end up with ax and ay column? So the names might be 'original_mpg', and 'new_mpg'?

  library(dplyr)
  left_join(mtcars, mtcars[,c("mpg",'cyl')], by=c("cyl"))
  names(mtcars) #ugh

Currently, this is an open issue with dplyr . You'll either have to rename before or after the join or use merge from base R, which takes a suffixes argument.

The default suffixes, c(".x", ".y") , can be overridden by passing them as a character vector of length 2:

library(dplyr)    
left_join(mtcars, mtcars[,c("mpg","cyl")], 
              by = c("cyl"), 
              suffix = c("_original", "_new")) %>% 
      head()

Output

 mpg_original cyl disp  hp drat   wt  qsec vs am gear carb mpg_new
1           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.0
2           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.0
3           21   6  160 110  3.9 2.62 16.46  0  1    4    4    21.4
4           21   6  160 110  3.9 2.62 16.46  0  1    4    4    18.1
5           21   6  160 110  3.9 2.62 16.46  0  1    4    4    19.2
6           21   6  160 110  3.9 2.62 16.46  0  1    4    4    17.8

You can use suffix with a slightly modified function I found in the help of strsplit to make it a prefix

library(dplyr)    
mt_cars <- left_join(mtcars, mtcars[,c("mpg","cyl")], 
              by = c("cyl"), 
              suffix = c("_original", "_new"))
strReverse <- function(x){
   sapply(lapply(strsplit(x, "_"), rev), paste, collapse = "_")
}
colnames(mt_cars) <- strReverse(colnames(mt_cars))

Well, I had a similar question when I found this post. I found a different solution to the question that I hope helps.

The solution is actually fairly simple, you generate a list with all the data frames you want to merge and use the reduce function.

library(dplyr)   

df_list <- list(df1, df2, df3)
df <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list)

This was a solution to another problem I had, I wanted to simplify merging multiple dataframes. But if you use two dataframes in the list, it works all the same and merging does not rename the columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM