Delete only one row of duplicated rows with a criteria

Question

My dataframe is much bigger than this one.

But the idea that I want is to

x = data.frame(
 A= c(3, 4, 5, 7,9),
 B= c(7, 8, 9, 3,5),
 C= c(11, 12, 13, 14,18)
 )

I am considering the rows 1 and 4 the same because for me the pair (3,7) and (7,3) are the same (the pairs (5,9) and (9,5) too). Whith this criteria I would like to leave only one pair.

The result should be this:

 x = data.frame(
     A= c(3, 4, 5),
     B= c(7, 8, 9),
     C= c(11, 12, 13)
     )

How can I do this?

Is it possible to do this with the function subset ?

Answer 1

library(dplyr)

x <- x %>%
  group_by(A, B) %>%
  mutate(AB = paste0(min(A, B), max(A, B)))

x[!duplicated(x$AB), -4]

# # A tibble: 3 x 3
# # Groups:   A, B [3]
#       A     B     C
#   <dbl> <dbl> <dbl>
# 1     3     7    11
# 2     4     8    12
# 3     5     9    13

Answer 2

A base R solution. Use ifelse and paste0 to create a variable that combines A and B , and puts the smallest value first. Then you can use duplicated to identify duplicate values, and subset.

index <- ifelse(x$A<x$B, paste0(x$A, '-', x$B), paste0(x$B, '-', x$A))
index

[1] "3-7" "4-8" "5-9" "3-7" "5-9"

x[!duplicated(index),]

  A B  C
1 3 7 11
2 4 8 12
3 5 9 13

Since you mention subset . It does the same as [] .

subset(x, !duplicated(index))

  A B  C
1 3 7 11
2 4 8 12
3 5 9 13

Answer 3

Here is a base R option with pmin/pmax and duplicated

x[!duplicated(with(x, pmin(A, B), pmax(A, B))),]
   A B  C
#1 3 7 11
#2 4 8 12
#3 5 9 13

Delete only one row of duplicated rows with a criteria

Question

3 answers

solution1
3 ACCPTED 2018-08-19 00:11:03

solution2
1 2018-08-19 10:03:18

solution3
1 2018-08-19 10:41:51

Delete only one row of duplicated rows with a criteria

Question

3 answers

solution1 3 ACCPTED 2018-08-19 00:11:03

solution2 1 2018-08-19 10:03:18

solution3 1 2018-08-19 10:41:51

solution1
3 ACCPTED 2018-08-19 00:11:03

solution2
1 2018-08-19 10:03:18

solution3
1 2018-08-19 10:41:51