What would be a good tidyverse approach to this type of problem? I want to filter out the duplicated rows of group
that have an NA
in them (keeping the row that has values for both var1
and var2
) but keep the rows when there is no duplicated value in group
. dat
illustrates the raw example with expected_output
showing what I'd hope to have.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
dat <- tibble::tribble(
~group, ~var1, ~var2,
"A", "foo", NA,
"A", "foo", "bar",
"B", "foo", NA,
"C", NA, "bar",
"C", "foo", "bar",
"D", NA, "bar",
"E", "foo", "bar",
"E", NA, "bar"
)
expected_output <- tibble::tribble(
~group, ~var1, ~var2,
"A", "foo", "bar",
"B", "foo", NA,
"C", "foo", "bar",
"D", NA, "bar",
"E", "foo", "bar"
)
expected_output
#> # A tibble: 5 x 3
#> group var1 var2
#> <chr> <chr> <chr>
#> 1 A foo bar
#> 2 B foo <NA>
#> 3 C foo bar
#> 4 D <NA> bar
#> 5 E foo bar
Any suggestions or ideas?
Solution 1 - if the duplicate rows are located in different positions for each group (eg first, last or somewhere in between)
dat %>%
arrange(group,var1,var2) %>%
group_by(group) %>%
slice_head() %>%
ungroup()
Output:
# A tibble: 5 x 3
group var1 var2
<chr> <chr> <chr>
1 A foo bar
2 B foo NA
3 C foo bar
4 D NA bar
5 E foo bar
Solution 2 - if the duplicate row is always the last row of that group
You can use duplicated
with the fromLast
option set to keep the last matched line, find the index of matches, negate it, and use that to remove duplicates as follows:
dat[!duplicated(dat$group, fromLast = TRUE), ]
which gives your requested output:
# A tibble: 4 x 3
group var1 var2
<chr> <chr> <chr>
1 A foo bar
2 B foo NA
3 C foo bar
4 D NA bar
One option could be:
dat %>%
group_by(group) %>%
slice_max(rowSums(!is.na(across(c(var1, var2)))), 1)
group var1 var2
<chr> <chr> <chr>
1 A foo bar
2 B foo <NA>
3 C foo bar
4 D <NA> bar
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.