remove rows in one column based on duplicated values in another column in R (remove specific raws)

Question

In my dataset I have two columns. POINT: contains only two categorical values 'random' and 'current' repeated all over the dataset. ID: contains a set of 5-digit continuous values associated to the values in POINT. Some of the values in ID are repeated.

I cannot figure out a code in R to eliminate ONLY the raws that have duplicated values in the ID column when the POINT value is 'random' when compared to 'current'. So I would like the below dataset :

POINT	ID
Current	45905
Current	40817
Current	55936
Current	66608
Current	66608
Random	45905
Random	40817
Random	55936
Random	66608
Random	44456

to look like this:

POINT	ID
Current	45905
Current	40817
Current	55936
Current	66608
Current	66608
Random	44456

Answer 1

Making use of dpylr this could be achieved like so:

Split your data by POINT
Filter non-duplicated IDs in the random part using an anti_join
Rowbind the filtered random dataset to the current dataset.

d <- data.frame(
  stringsAsFactors = FALSE,
             POINT = c("Current","Current","Current",
                       "Current","Current","Random","Random","Random",
                       "Random","Random"),
                ID = c(45905L,40817L,55936L,66608L,
                       66608L,45905L,40817L,55936L,66608L,44456L)
)

d_split <- split(d, d$POINT)

library(dplyr)

random_keep <- dplyr::anti_join(d_split$Random, d_split$Current, by = "ID")
d_final <- dplyr::bind_rows(d_split$Current, random_keep)

head(d_final)
#>     POINT    ID
#> 1 Current 45905
#> 2 Current 40817
#> 3 Current 55936
#> 4 Current 66608
#> 5 Current 66608
#> 6  Random 44456

Answer 2

If I understood you right, you could use dplyr to do this:

library(dplyr)

split_data <- split(your_data, ~ POINT)

full_join(split_data$Current, split_data$Random, by = "ID") %>%
  transmute(POINT = coalesce(POINT.x, "Random"), ID)

Returns:

# A tibble: 6 x 2
  POINT      ID
  <chr>   <int>
1 Current 45905
2 Current 40817
3 Current 55936
4 Current 66608
5 Current 66608
6 Random  44456

(Data used:)

your_data <- structure(list(POINT = c("Current", "Current", "Current", "Current", "Current", "Random", "Random", "Random", "Random", "Random"), ID = c(45905L, 40817L, 55936L, 66608L, 66608L, 45905L, 40817L, 55936L, 66608L, 44456L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

Answer 3

You can use the negated %in% to exclude duplicated ID for POINT==random .

i <- D$POINT=="Current"
D[i | !D$ID %in% D$ID[i],]
#     POINT    ID
#1  Current 45905
#2  Current 40817
#3  Current 55936
#4  Current 66608
#5  Current 66608
#10  Random 44456

Data:

D <- data.frame(POINT = c("Current","Current","Current","Current","Current"
  ,"Random","Random","Random","Random","Random")
, ID = c(45905L,40817L,55936L,66608L,66608L,45905L,40817L,55936L,66608L,44456L))

remove rows in one column based on duplicated values in another column in R (remove specific raws)

Question

3 answers

solution1
0 2021-06-29 08:20:09

solution2
0 2021-06-29 08:21:57

solution3
0 2021-06-29 09:22:40

remove rows in one column based on duplicated values in another column in R (remove specific raws)

Question

3 answers

solution1 0 2021-06-29 08:20:09

solution2 0 2021-06-29 08:21:57

solution3 0 2021-06-29 09:22:40

solution1
0 2021-06-29 08:20:09

solution2
0 2021-06-29 08:21:57

solution3
0 2021-06-29 09:22:40