简体   繁体   中英

How can I check for differences across multiple columns in different datasets (in R)?

I feel like this may be a common question but I haven't been able to find an answer so far. I have two datasets, one "standard" and one "reference" dataset. In practice, the columns in the standard dataset are used to merge many columns from the reference dataset, simulated here as "xtracols". One thing I need to be able to do is know why the merge fails, if that happens. For example, if I have the two dataframes:

df <- data.frame(
  id = c(1,2,3),
  batch = c(40,45,42),
  dil = c(100, 1000, 2000)
)

refdf <- data.frame(
  id = c(rep(1, 5), rep(2, 5), rep(3, 5)),
  batch = c(rep(40, 5), rep(41, 5), rep(42, 5)),
  dil = rep(c(1, 10, 100, 1000, 10000), 3),
  xtrcols = rep(c("a", "b", "c"), 5)
)

and I merge like this:

merged <- merge(df, refdf, by = c("id", "batch", "dil"), all.x = TRUE)

I need to know if the merge for a given "id" failed because the batch is incorrect, such as would be the case for id = 2, or if the merge failed because the value for "dil" does not exist in refdf, as would be the case for id = 3. I've tried a few ways of iterating through the dataframes and using match(), but so far nothing has really worked as anticipated.

This would be my anticipated outcome of running the code above. The merge failed for ids 2 and 3, but for different reasons. I'm trying to find a way to know which of those two reasons the merge failed so I can return a specific error message based on that.

在此处输入图像描述

Any insight would be much appreciated.

Update after changed dataframes: With the changed dataframes: Again anti_join will help:

anti_join(df, refdf, by=c("id", "batch", "dil"))
  id batch  dil
1  2    45 1000
2  3    42 2000

First answer: We could use anti_join : anti_join() return all rows from x without a match in y.

library(dplyr)

anti_join(df, refdf, by=c("id", "batch"))
  id batch  dil
1  2    45 1000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM