Combining multiple rows for one ID into one row with multiple columns based on 2 different variables in R

Question

I am working with a dataframe in R that looks like this:

id <- c(1,1,1,2,2,3,3,3,3)
dx_code <- c("MI","HF","PNA","PNA","Cellulitis","MI","Flu","Sepsis","HF")
dx_date <- c("7/11/22","7/11/22","8/1/22","8/4/22","8/7/22","8/4/22","7/11/22","7/11/22","9/10/22")
df <- data.frame(id, dx_code, dx_date)
df

I want to be able to group it so that each patient ID has each date they were seen and each diagnosis they received on each specific date. So it would look something like:

id2 <- c(1,2,3)
dx_date1 <- c("7/11/22","8/4/22","8/4/22")
dx_date1code1 <- c("MI","PNA","MI")
dx_date1code2 <- c("HF",NA,NA)
dx_date2 <- c("8/1/22","8/7/22","7/11/22")
dx_date2code1 <- c("PNA","Cellulitis","Flu")
dx_date2code2 <- c(NA,NA,"Sepsis")
dx_date3 <- c(NA,NA,"9/10/22")
dx_date3code1 <- c(NA,NA,"HF")
df2 <- data.frame(id2, dx_date1, dx_date1code1,dx_date1code2,dx_date2,dx_date2code1,dx_date2code2,dx_date3,dx_date3code1)
df2

I am not sure how to reformat it in this way - is there a function in R, or should I try to use for loops? I would appreciate any help - thanks so much!

Answer 1

We can use a pivot_longer followed by a pivot_wider after some modifications.

library(tidyr)
 df3 <- df %>% group_by(id) %>% 
   mutate(
     dx_code =paste('code', row_number()), 
     dx_date =paste('date', row_number())) %>% 
   pivot_longer(cols = c('dx_code', 'dx_date')) %>% 
   mutate(value = ifelse(grepl('code', name), dx_code, dx_date)) %>% 
   group_by(id, name) %>% mutate(
     name = paste(name, row_number()))

      id name      value     
   <dbl> <chr>     <chr>     
 1     1 dx_code 1 MI        
 2     1 dx_date 1 7/11/22   
 3     1 dx_code 2 PNA       
 4     1 dx_date 2 8/4/22

We now have the data in a format suitable for pivot_wider .

df3 %>% pivot_wider(names_from = name, values_from = value) 

# Groups:   id [3]
     id `dx_code 1` `dx_date 1` `dx_code 2` `dx_date 2` `dx_code 3` `dx_date 3` `dx_code 4` `dx_date 4`
  <dbl> <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>      
1     1 MI          7/11/22     PNA         8/4/22      Cellulitis  8/4/22      NA          NA         
2     2 MI          7/11/22     PNA         8/4/22      NA          NA          NA          NA         
3     3 MI          7/11/22     PNA         8/4/22      Cellulitis  8/4/22      Flu         7/11/22

Answer 2

I believe you can use pivot_wider for this. The output is not the same is in the original post, but similar to what you provided in your comment.

You can enumerate dates and codes after grouping by id using row_number() .

After using pivot_wider , you can select column names based on the numeric value contained, which will reorder so that dates and codes columns are next to each other.

library(tidyverse)

df %>%
  group_by(id) %>%
  mutate(code_num = row_number()) %>%
  pivot_wider(id_cols = id, 
              names_from = code_num, 
              values_from = c(dx_date, dx_code)) %>%
  select(id, names(.)[-1][order(readr::parse_number(names(.)[-1]))])

Output

     id dx_date_1 dx_code_1 dx_date_2 dx_code_2  dx_date_3 dx_code_3 dx_date_4 dx_code_4
  <dbl> <chr>     <chr>     <chr>     <chr>      <chr>     <chr>     <chr>     <chr>    
1     1 7/11/22   MI        7/11/22   HF         8/1/22    PNA       NA        NA       
2     2 8/4/22    PNA       8/7/22    Cellulitis NA        NA        NA        NA       
3     3 8/4/22    MI        7/11/22   Flu        7/11/22   Sepsis    9/10/22   HF

Combining multiple rows for one ID into one row with multiple columns based on 2 different variables in R

Question

1 answers

solution1
0 2022-08-13 21:05:19

solution2
0 2022-08-15 01:02:31

Combining multiple rows for one ID into one row with multiple columns based on 2 different variables in R

Question

1 answers

solution1 0 2022-08-13 21:05:19

solution2 0 2022-08-15 01:02:31

solution1
0 2022-08-13 21:05:19

solution2
0 2022-08-15 01:02:31