简体   繁体   中英

How can I select two columns in a dataframe in R?

Imagine I got 2 df, named A and B. For each row of df A, I'd like to check if there is the respective row in B df. In the example below, the code would print me only one answer TRUE, because the last row in df A do not match with the last row in df B.

A <- NULL
B <- NULL
A <- data.frame(A = c('a','b','c','d','e'), B = c('1','2','3','4','5'))
B <- data.frame(A = c('a','b','c','d','f'), B = c('1','2','3','4','5'))

i <- 0
for(i in 1: length(A$A))
{
  point <- A[i,]
  if(!point %in% B[which[1:2]])
    print(TRUE)
}
bool = Reduce(paste, A) %in% Reduce(paste, B)
transform(A, msg = c("Absent", "Present")[bool + 1])
#  A B     msg
#1 a 1 Present
#2 b 2 Present
#3 c 3 Present
#4 d 4 Present
#5 e 5  Absent

You can check if the anti-join of the two tables contains any rows (ie if there are any rows not equal between the two data frames on the common columns), and print TRUE if so

if(diff_rows <- nrow(dplyr::anti_join(A, B)) > 0) print(diff_rows)

# Joining, by = c("A", "B")
# [1] TRUE
# Warning message:
# Column `A` joining factors with different levels, coercing to character vector

You can clean up the output some if you want to ignore warnings

if(diff_rows <- nrow(suppressWarnings(dplyr::anti_join(A, B, by = names(A)))) > 0) 
  print(diff_rows)

# [1] TRUE 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM