简体   繁体   English

合并2个数据框,其中多个列具有不同的列名

[英]Merge 2 dataframes with multiple columns of different column names

I need help with a merge(vlookup) problem that I can not solve. 我需要解决无法解决的merge(vlookup)问题。 I have 2 data frames I would like to merge, in addition they also have different column names. 我有2个要合并的数据框,此外它们还具有不同的列名。 My real datasets have many columns and that why its a hard for me to come up with a solution. 我的真实数据集有很多列,这就是为什么我很难提出解决方案的原因。 I have tried the merge function but I can not figure out how to do it on multiple columns with different names. 我已经尝试过合并功能,但是我无法弄清楚如何在具有不同名称的多个列上进行合并。 I would like to explicitly specify the column names using something like: 我想使用类似以下内容的方式明确指定列名称:

output <- merge(df1, df.vlookup, by.x=????, by.y=???, ) #just where I am today

Here is a very simplified example 这是一个非常简化的示例

id<-c(2,4,6,8,10,12,14,16,18,20,22,24,26,28)
bike <- c(1,3,2,1,1,1,2,3,2,3,1,1,1,1)
size <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2)
color <-c (10,11,13,15,12,12,12,11,11,14,12,11,10,10)
price <- c(1,2,2,2,1,3,1,1,2,1,2,1,2,1)


df1 <- data.frame(id,bike,size,color,price)

   id bike size color price
1   2    1    1    10     1
2   4    3    2    11     2
3   6    2    1    13     2
4   8    1    2    15     2
5  10    1    1    12     1
6  12    1    2    12     3
7  14    2    1    12     1
8  16    3    2    11     1
9  18    2    1    11     2
10 20    3    2    14     1
11 22    1    1    12     2
12 24    1    2    11     1
13 26    1    1    10     2
14 28    1    2    10     1


b1<-c(1,2,3)
b2<-c("Alan", "CCM", "Basso")
s1 <- c(1,2)
s2 <- c("L","S")
c1<-c(10,11,12,13,14,15)
c2 <-c("black","blue","green","red","pink")
p1<- c(1,2,3)
p2<- c(1000,2000,3000)

#trick for making a dataframe with unequal vector length
na.pad <- function(x,len){
  x[1:len]
}

makePaddedDataFrame <- function(l,...){
  maxlen <- max(sapply(l,length))
  data.frame(lapply(l,na.pad,len=maxlen),...)
}

df.vlookup <- makePaddedDataFrame(list(b1=b1,b2=b2,s1=s1,s2=s2,c1=c1,c2=c2,p1=p1,p2=p2))

> df.vlookup
  b1    b2 s1   s2 c1    c2 p1   p2
1  1  Alan  1    L 10 black  1 1000
2  2   CCM  2    S 11  blue  2 2000
3  3 Basso NA <NA> 12 green  3 3000
4 NA  <NA> NA <NA> 13   red NA   NA
5 NA  <NA> NA <NA> 14  pink NA   NA
6 NA  <NA> NA <NA> 15  <NA> NA   NA

Here is a dataframe that I would like to end up with: 这是我要结束的一个数据框:

> df.final
   id bike    b2 size s2 color    c2 price
1   2    1  Alan    1  L    10 black     1
2   4    3 Basso    2  S    11  blue     2
3   6    2   CCM    1  L    13   red     2
4   8    1  Alan    2  S    15  #N/A     2
5  10    1  Alan    1  L    12 green     1
6  12    1  Alan    2  S    12 green     3
7  14    2   CCM    1  L    12 green     1
8  16    3 Basso    2  S    11  blue     1
9  18    2   CCM    1  L    11  blue     2
10 20    3 Basso    2  S    14  pink     1
11 22    1  Alan    1  L    12 green     2
12 24    1  Alan    2  S    11  blue     1
13 26    1  Alan    1  L    10 black     2
14 28    1  Alan    2  S    10 black     1   

Really appreciate some help on this... 非常感谢对此的一些帮助...

I don't think a single data frame for lookup values is the right approach. 我认为查找值的单个数据框不是正确的方法。 What about using named vectors? 那使用命名向量呢?

For example: 例如:

bike_names <- c("Alan" = 1, "CCM" = 2, "Basso" = 3)
df1$b2 <- names(bike_names[ df1$bike ])

If using data frames, put each lookup table in a separate data frame. 如果使用数据帧,则将每个查找表放在单独的数据帧中。

lookup <- list(
  bike = data.frame( bike = c(1, 2, 3), bike_name = c("Alan", "CCM", "Basso")),
  size = data.frame(size = c(1, 2),  size_name = c("L", "S")),
  color = data.frame(color = c(10, 11, 12, 13, 14, 15), color_name = c("black", "blue", "green", "red", "pink", NA)),
  price = data.frame(price = c(1, 2, 3), price_name = c(1000, 2000, 3000))
)

And use it with merge: 并与合并一起使用:

Reduce(merge, c(data = list(df1), lookup))

Or use dplyr and joins: 或使用dplyr并加入:

library(dplyr)

df1 %>%
  left_join(lookup$bike, by = c("bike")) %>%
  left_join(lookup$size, by = c("size")) %>%
  left_join(lookup$color, by = c("color")) %>%
  left_join(lookup$price, by = c("price"))

Update 更新资料

But if you really want to start from the df.vlookup data frame, you can convert it to a list of data frames like this: 但是,如果您确实要从df.vlookup数据帧开始,则可以将其转换为数据帧列表,如下所示:

lookup <- lapply(seq(1, to = ncol(df.vlookup), by = 2), function(i) {
  setNames(df.vlookup[,c(i,i+1)], c(names(df1)[i/2+2], names(df.vlookup)[i+1]))
})

And use it in a multiple merge: 并在多重合并中使用它:

Reduce(merge, c(data = list(df1), lookup))

NOTE: When creating lookup list there are some assumptions about column order in df1 and in df.vlookup 注意:创建查找列表时,对df1df.vlookup列顺序有一些假设。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何合并具有不同列名称的多个数据框 - How to merge multiple dataframes with different column names 匹配/合并具有r中不同列名的数字列的数据框 - match/merge dataframes with a number columns with different column names in r 如何连接具有不同列名的多个数据框? - How to concatenate multiple dataframes with different column names? 如何根据 R 中的字典在多个数据框中重命名具有不同列名和不同顺序的多个列 - How to rename multiple columns with different column names and different order in several dataframes based on a dictionary in R 如何合并具有不同列名的两个数据框 - How to merge two dataframes that have different column names 当列名不同时如何通过选择特定列来组合多个数据框 - How to combine multiple dataframes by selecting specific columns when the column names are different 如何合并具有相同列名的多个数据框? - How can I merge multiple dataframes with the same column names? 合并两个共享多个列名的数据框而不覆盖值 - Merge two dataframes that share multiple column names without overriding values 合并列表的数据框并获取数据框的名称作为列 - Merge dataframes of list and obtain names of dataframes as column 在不同类型的多列上合并两个熊猫数据框 - Merge two pandas dataframes on multiple columns of different types
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM