简体   繁体   中英

How to collapse redundant rows together to get rid of mirrored NAs in two columns?

I'm modifying this toy df fromthis question, which is similar to mine but different enough that its answer has left me slightly confused.

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

This produces a dataframe looking like this:

id1 id2 id3 n1 n2 n3 n4
a   NA  a   2  2  0  0
NA  a   a   2  2  0  0
NA  a   e   3  1  3  2
c   NA  e   3  1  3  2

Aside from id1 and id2, the first two rows and the last two rows are identical. I'm trying to fill in the blanks to make them completely identical, so I can then apply distinct() so that the now-duplicated rows disappear, resulting in a dataframe like this:

id1 id2 id3 n1 n2 n3 n4
a   a  a   2  2  0  0
c   a  e   3  1  3  2

Is there any way to accomplish this (preferably a tidyverse solution)? I'm basically trying to collapse all my data's redundancies.

Perhaps something like this?

df %>% 
  group_by(id3, n1, n2, n3, n4) %>% 
  summarise(id1 = na.omit(id1),
            id2 = na.omit(id2)) %>% 
  ungroup() %>% 
  select(id1,id2,id3,n1,n2,n3,n4)

output

# A tibble: 2 × 7
  id1   id2   id3   n1    n2    n3    n4   
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 a     a     a     2     2     0     0    
2 c     a     e     3     1     3     2

This solution is very specific to this scenario. It would not work if you had multiple id1s per group for example.

Another possible solution where I first created an index to group on:

df <- data.frame(id1 = c("a" , "NA", "NA", "c"),
                 id2 = c(NA,"a","a",NA),
                 id3 = c("a", "a", "e", "e"),
                 n1 = c(2,2,3,3),
                 n2 = c(2,2,1,1),
                 n3 = c(0,0,3,3),
                 n4 = c(0,0,2,2))

library(dplyr)
df %>%
  mutate(index = rep(seq_len(2), each=2)) %>%
  group_by(index) %>%
  arrange(id1) %>%
  summarise(across(everything(), funs(first(.[!is.na(.)])))) %>%
  select(-index)
#> # A tibble: 2 × 7
#>   id1_first id2_first id3_first n1_first n2_first n3_first n4_first
#>   <chr>     <chr>     <chr>        <dbl>    <dbl>    <dbl>    <dbl>
#> 1 a         a         a                2        2        0        0
#> 2 c         a         e                3        1        3        2

Created on 2022-07-09 by the reprex package (v2.0.1)

Another possible solution:

library(tidyverse)

df %>% 
  group_by(id3, across(n1:n4)) %>% 
  fill(id1:id2, .direction = "updown") %>% 
  ungroup %>% 
  distinct

#> # A tibble: 2 × 7
#>   id1   id2   id3      n1    n2    n3    n4
#>   <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a     a     a         2     2     0     0
#> 2 c     a     e         3     1     3     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM