简体   繁体   English

根据另一个数据帧中的匹配条件将列添加到 R 中的数据帧

[英]Adding column to a dataframe in R based on matching conditions in another dataframe

I have 2 data frames and I would like to add a column to one and add values from another data frame based on matching conditions.我有 2 个数据框,我想向其中添加一列,并根据匹配条件从另一个数据框中添加值。 I would then like to repeat this for many large datasets.然后我想对许多大型数据集重复这一点。

# Example dataframe

DF1 <- data.frame(Batch = c('a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'),
           Patch1 = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3),
           Patch2 = c(2, 3, 1, 3, 1, 2, 2, 3, 1, 3, 1, 2))

DF2 <- data.frame(Batch = c('a', 'a', 'a', 'b', 'b', 'b'),
                             Patch = c(1, 2, 3, 1, 2, 3),
                             Pop_size = sample(1:300, size=6, replace=TRUE)) 

To DF1 I would like to add 2 columns (Patch1_popsize and Patch2_popsize) where the Pop_size is given in DF2 with corresponding Patch numbers for a given Batch对于 DF1,我想添加 2 列(Patch1_popsize 和 Patch2_popsize),其中 Pop_size 在 DF2 中给出,并带有给定批次的相应补丁号

Try using match() in this style:尝试以这种方式使用match()

#Code
DF1$Patch1_Pop_size <- DF2[match(paste(DF1$Batch,DF1$Patch1),paste(DF2$Batch,DF2$Patch)),"Pop_size"]
DF1$Patch2_Pop_size <- DF2[match(paste(DF1$Batch,DF1$Patch2),paste(DF2$Batch,DF2$Patch)),"Pop_size"]

Output:输出:

DF1
   Batch Patch1 Patch2 Patch1_Pop_size Patch2_Pop_size
1      a      1      2             137             254
2      a      1      3             137             211
3      a      2      1             254             137
4      a      2      3             254             211
5      a      3      1             211             137
6      a      3      2             211             254
7      b      1      2              78              81
8      b      1      3              78              43
9      b      2      1              81              78
10     b      2      3              81              43
11     b      3      1              43              78
12     b      3      2              43              81

Try this:尝试这个:

DF3 <- merge(DF1,DF2, by.x = c("Batch","Patch1"), by.y=c("Batch","Patch"), all.x=TRUE) %>%
      rename(Pop_size1=Pop_size)
DF3 <- merge(DF3,DF2, by.x = c("Batch","Patch2"), by.y=c("Batch","Patch"), all.x=TRUE) %>%
      rename(Pop_size2=Pop_size)

We can use data.table methods which would be faster and efficient我们可以使用data.table方法,这会更快更有效

library(data.table)
setDT(DF1)[DF2, Patch1_Pop_size := Pop_size,
  on = .(Batch, Patch1 = Patch)][DF2, 
   Patch2_Pop_size := Pop_size, on = .(Batch, Patch2 = Patch)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM