简体   繁体   English

将复制的数据移动到 R 中的下一列

[英]Moving replicated data in to the next columns in R

my data are as follows:我的数据如下:

 df <- read.table(text = "MXZ 'bam' 12 'B1' 'sdr' 11 'B3' 'kar' 13 'B5' 'mmn' 13 'B7' 'bam' 14 'B4' 'kar' 17 'B1' 'bam' 10 'B6' 'zar' 11 'B8' 'mmn' 12 'B12' ", header = TRUE)

I want to move the replicated data into the next column.我想将复制的数据移动到下一列。 Considering "bam", it appears three times.考虑到“bam”,它出现了三遍。 Now, I want to move it to the next column, where it appears for the first time, other replicated data will appear in the other columns.现在,我想将它移到下一列,它第一次出现的地方,其他复制的数据将出现在其他列中。 When the replicated data are moved to other columns, they will be removed from the columns to get the following tables:当复制的数据被移动到其他列时,它们将从列中删除,得到以下表:

 df <- read.table(text = " MXZ X1 Z1 X2 Z2 'bam' 12 'B1' 14 'B4' 10 'B6' 'sdr' 11 'B3' NA NA NA NA 'kar' 13 'B5' 17 'B1' NA NA 'mmn' 13 'B7' 12 'B12' NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 'zar' 11 'B8' NA NA NA NA NA NA NA NA NA NA NA ", header = TRUE) > df

I understand that I need to give my solution, but I was unable to find out a solution.我知道我需要提供我的解决方案,但我无法找到解决方案。

One possible way is to use unnest_wider() in tidyr .一种可能的方法是在tidyr中使用unnest_wider() When unnested, the name for a list item will be automatically used as the column name.取消嵌套时,列表项的名称将自动用作列名。

I believe there is better way of constructing the records list, but currently, this is the best I can think of.我相信有更好的方法来构建records列表,但目前,这是我能想到的最好的方法。

library(dplyr)
library(tidyr)

df1 <- df %>%
    group_by(M) %>%
    # convert column X and Z to a list column with each element named as
    # X_1,X_2,... and Z_1,Z_2, ...
    summarise(records = list(
        append(
            as.list(X) %>% setNames(paste0("X_",seq_along(X))),
            as.list(Z) %>% setNames(paste0("Z_",seq_along(Z)))
        ))
    ) %>%
   # when unnested, the name for a list element will be automated applied the the column name
    unnest_wider(records)
> df1

# A tibble: 5 x 7
  M       X_1   X_2   X_3 Z_1   Z_2   Z_3  
  <chr> <int> <int> <int> <chr> <chr> <chr>
1 bam      12    14    10 B1    B4    B6   
2 kar      13    17    NA B5    B1    NA   
3 mmn      13    12    NA B7    B12   NA   
4 sdr      11    NA    NA B3    NA    NA   
5 zar      11    NA    NA B8    NA    NA   

Here is one option by looping through the names of the dataset except the first one, grouped by 'M', summarise in a list , use unnest_wider , reduce to a single data.frame by joining the elements of list , then right_join with the original dataset 'M' and reorder the columns of the dataset这是一个选项,通过循环遍历数据集的names ,除了第一个,按“M”分组, summarise在一个list ,使用unnest_wider ,通过连接list的元素reduce到单个 data.frame,然后right_join与原始数据集'M'并重新排序数据集的列

library(purrr)
library(tidyr)
library(dplyr)
library(stringr)
map(names(df)[-1], ~ df %>%
       dplyr::select(M, .x) %>% 
       group_by(M) %>%
       summarise(!! .x := list(as.list(!! rlang::sym(.x)) %>% 
                     set_names(str_c(.x, seq_along(.)))))  %>% 
       unnest_wider(.x)) %>% 
  reduce(full_join, by = 'M') %>% 
  right_join(df1 %>%
                dplyr::select(M)) %>% 
  dplyr::select(M, order(str_remove(names(.)[-1], "\\D+")) + 1)
# A tibble: 9 x 7
#  M        X1 Z1       X2 Z2       X3 Z3   
#  <fct> <int> <fct> <int> <fct> <int> <fct>
#1 bam      12 B1       14 B4       10 B6   
#2 sdr      11 B3       NA <NA>     NA <NA> 
#3 kar      13 B5       17 B1       NA <NA> 
#4 mmn      13 B7       12 B12      NA <NA> 
#5 <NA>     NA <NA>     NA <NA>     NA <NA> 
#6 <NA>     NA <NA>     NA <NA>     NA <NA> 
#7 <NA>     NA <NA>     NA <NA>     NA <NA> 
#8 zar      11 B8       NA <NA>     NA <NA> 
#9 <NA>     NA <NA>     NA <NA>     NA <NA> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM