简体   繁体   中英

Moving replicated data in to the next columns in R

my data are as follows:

 df <- read.table(text = "MXZ 'bam' 12 'B1' 'sdr' 11 'B3' 'kar' 13 'B5' 'mmn' 13 'B7' 'bam' 14 'B4' 'kar' 17 'B1' 'bam' 10 'B6' 'zar' 11 'B8' 'mmn' 12 'B12' ", header = TRUE)

I want to move the replicated data into the next column. Considering "bam", it appears three times. Now, I want to move it to the next column, where it appears for the first time, other replicated data will appear in the other columns. When the replicated data are moved to other columns, they will be removed from the columns to get the following tables:

 df <- read.table(text = " MXZ X1 Z1 X2 Z2 'bam' 12 'B1' 14 'B4' 10 'B6' 'sdr' 11 'B3' NA NA NA NA 'kar' 13 'B5' 17 'B1' NA NA 'mmn' 13 'B7' 12 'B12' NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 'zar' 11 'B8' NA NA NA NA NA NA NA NA NA NA NA ", header = TRUE) > df

I understand that I need to give my solution, but I was unable to find out a solution.

One possible way is to use unnest_wider() in tidyr . When unnested, the name for a list item will be automatically used as the column name.

I believe there is better way of constructing the records list, but currently, this is the best I can think of.

library(dplyr)
library(tidyr)

df1 <- df %>%
    group_by(M) %>%
    # convert column X and Z to a list column with each element named as
    # X_1,X_2,... and Z_1,Z_2, ...
    summarise(records = list(
        append(
            as.list(X) %>% setNames(paste0("X_",seq_along(X))),
            as.list(Z) %>% setNames(paste0("Z_",seq_along(Z)))
        ))
    ) %>%
   # when unnested, the name for a list element will be automated applied the the column name
    unnest_wider(records)
> df1

# A tibble: 5 x 7
  M       X_1   X_2   X_3 Z_1   Z_2   Z_3  
  <chr> <int> <int> <int> <chr> <chr> <chr>
1 bam      12    14    10 B1    B4    B6   
2 kar      13    17    NA B5    B1    NA   
3 mmn      13    12    NA B7    B12   NA   
4 sdr      11    NA    NA B3    NA    NA   
5 zar      11    NA    NA B8    NA    NA   

Here is one option by looping through the names of the dataset except the first one, grouped by 'M', summarise in a list , use unnest_wider , reduce to a single data.frame by joining the elements of list , then right_join with the original dataset 'M' and reorder the columns of the dataset

library(purrr)
library(tidyr)
library(dplyr)
library(stringr)
map(names(df)[-1], ~ df %>%
       dplyr::select(M, .x) %>% 
       group_by(M) %>%
       summarise(!! .x := list(as.list(!! rlang::sym(.x)) %>% 
                     set_names(str_c(.x, seq_along(.)))))  %>% 
       unnest_wider(.x)) %>% 
  reduce(full_join, by = 'M') %>% 
  right_join(df1 %>%
                dplyr::select(M)) %>% 
  dplyr::select(M, order(str_remove(names(.)[-1], "\\D+")) + 1)
# A tibble: 9 x 7
#  M        X1 Z1       X2 Z2       X3 Z3   
#  <fct> <int> <fct> <int> <fct> <int> <fct>
#1 bam      12 B1       14 B4       10 B6   
#2 sdr      11 B3       NA <NA>     NA <NA> 
#3 kar      13 B5       17 B1       NA <NA> 
#4 mmn      13 B7       12 B12      NA <NA> 
#5 <NA>     NA <NA>     NA <NA>     NA <NA> 
#6 <NA>     NA <NA>     NA <NA>     NA <NA> 
#7 <NA>     NA <NA>     NA <NA>     NA <NA> 
#8 zar      11 B8       NA <NA>     NA <NA> 
#9 <NA>     NA <NA>     NA <NA>     NA <NA> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM