![](/img/trans.png)
[英]Matching multiple columns in two dataframes in R using the merge or match function
[英]Merge two dataframes and create multiple columns in R
假设我们有两个数据帧,如下所示:
df1 <- data.frame(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F"))
df2 <- data.frame(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))
我想要的是在df2中创建三个列作为Team1_index,Team2_index和Winner_index。
Team1 Team2 Winner Team1_index Team2_index Winner_index
A D A 1 4 1
B E E 2 5 5
C F F 3 6 6
我尝试了很多方法,但是失败了。 提示和建议!
如果只有少量的列,则可以使用示例中的match函数:
df1$Team1_index <- df2$Index[match(df1$Team1, df2$Country)]
df1$Team2_index <- df2$Index[match(df1$Team2, df2$Country)]
df1$Winner_index <- df2$Index[match(df1$Winner, df2$Country)]
df1
如果您有更多的专栏,您可能会寻找更系统的解决方案,但是如果确实只有三种情况,则应该这样做:
library("tidyverse")
df1 <- data.frame(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F"))
df2 <- data.frame(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))
df1 %>%
left_join(df2 %>% rename(Team1 = Country), by = "Team1") %>%
rename(Team1_Index = Index) %>%
left_join(df2 %>% rename(Team2 = Country), by = "Team2") %>%
rename(Team2_Index = Index) %>%
left_join(df2 %>% rename(Winner = Country), by = "Winner") %>%
rename(Winner_Index = Index)
#> Warning: Column `Team1` joining factors with different levels, coercing to
#> character vector
#> Warning: Column `Team2` joining factors with different levels, coercing to
#> character vector
#> Warning: Column `Winner` joining factors with different levels, coercing to
#> character vector
#> Team1 Team2 Winner Team1_Index Team2_Index Winner_Index
#> 1 A D A 1 4 1
#> 2 B E E 2 5 5
#> 3 C F F 3 6 6
您可以放心地忽略警告。
要获取新列作为因素:
df1[paste0(colnames(df1),"_index")] <- lapply(df1,factor,df2$Country,df2$Index)
# Team1 Team2 Winner Team1_index Team2_index Winner_index
# 1 A D A 1 4 1
# 2 B E E 2 5 5
# 3 C F F 3 6 6
要获得新的数字列:
df1[paste0(colnames(df1),"_index")] <-
lapply(df1,function(x) as.numeric(as.character(factor(x,df2$Country,df2$Index))))
# Team1 Team2 Winner Team1_index Team2_index Winner_index
# 1 A D A 1 4 1
# 2 B E E 2 5 5
# 3 C F F 3 6 6
请注意,对于这种特定情况(索引从1递增1),此较短的版本适用:
df1[paste0(colnames(df1),"_index")] <-
lapply(df1,function(x) as.numeric(factor(x,df2$Country)))
这是另一个使用match
和cbind
。
df3 <- as.matrix(df1)
colnames(df3) <- paste0(colnames(df3), "_index")
# match the positions
df3[] <- match(df3, df2$Country)
cbind(df1, df3)
# Team1 Team2 Winner Team1_index Team2_index Winner_index
#1 A D A 1 4 1
#2 B E E 2 5 5
#3 C F F 3 6 6
df3
创建为矩阵,即具有Dimensions属性的向量,因此我们可以立即将其条目替换为match
(向量)的结果,而无需为每一列重复代码。
或一口气
df1[paste0(colnames(df1), "_index")] <- match(as.matrix(df1), df2$Country)
但是请注意,这将忽略df2
的index
列。
感谢@Moody_Mudskipper,我们也可以将其写为
df1[paste0(colnames(df1), "_index")] <- lapply(df1, function(x) df2$Index[match(x, df2$Country)])
我对data.table有一个几乎解决方案,使用melt
和dacst
改变形状
library(data.table)
df1 <- data.table(Team1 = c("A","B","C"), Team2 = c("D","E","F"), Winner = c("A","E","F"))
df2 <- data.table(Country = c("A","B","C","D","E","F"), Index = c(1,2,3,4,5,6))
melt(data = df1 , id.vars = )
plouf <- merge(df2,melt(df1,measure = 1:2), by.x = "Country", by.y = "value")
plouf[,winneridx := Index[Country == Winner]]
dcast(plouf,Country+winneridx~variable,value.var = "Index")
Country winneridx Team1 Team2
1: A 1 1 NA
2: B 5 2 NA
3: C 6 3 NA
4: D 1 NA 4
5: E 5 NA 5
6: F 6 NA 6
这基本上与giocomai的答案相同,只是使用purrr
来帮助消除重复:
library(rlang)
library(dplyr)
getIndexCols <- function(df1, df2, colName){
idxColName <- sym(paste0(colName, "_Index"))
df1 %>% left_join(df2 %>% rename(!! sym(colName) := Country, !! idxColName := Index))
}
names(df1) %>% purrr::map(~ getIndexCols(df1, df2, .)) %>% reduce(~ left_join(.x, .y))
您可以使用chartr
这将同时考虑“国家”列和“索引”列:
df3=as.matrix(setNames(df1,paste0(names(df1),"_index")))
cbind(df1,chartr(paste0(df2$Country,collapse=""),paste0(df2$Index,collapse=""),df3))
Team1 Team2 Winner Team1_index Team2_index Winner_index
1 A D A 1 4 1
2 B E E 2 5 5
3 C F F 3 6 6
您也可以:
cbind(df1,do.call(chartr,c(as.list(sapply(unname(df2),paste,collapse="")),list(df3))))
Team1 Team2 Winner Team1_index Team2_index Winner_index
1 A D A 1 4 1
2 B E E 2 5 5
3 C F F 3 6 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.