my data are as follows:
df <- read.table(text = "MXZ 'bam' 12 'B1' 'sdr' 11 'B3' 'kar' 13 'B5' 'mmn' 13 'B7' 'bam' 14 'B4' 'kar' 17 'B1' 'bam' 10 'B6' 'zar' 11 'B8' 'mmn' 12 'B12' ", header = TRUE)
I want to move the replicated data into the next column. Considering "bam", it appears three times. Now, I want to move it to the next column, where it appears for the first time, other replicated data will appear in the other columns. When the replicated data are moved to other columns, they will be removed from the columns to get the following tables:
df <- read.table(text = " MXZ X1 Z1 X2 Z2 'bam' 12 'B1' 14 'B4' 10 'B6' 'sdr' 11 'B3' NA NA NA NA 'kar' 13 'B5' 17 'B1' NA NA 'mmn' 13 'B7' 12 'B12' NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 'zar' 11 'B8' NA NA NA NA NA NA NA NA NA NA NA ", header = TRUE) > df
I understand that I need to give my solution, but I was unable to find out a solution.
One possible way is to use unnest_wider()
in tidyr
. When unnested, the name for a list item will be automatically used as the column name.
I believe there is better way of constructing the records
list, but currently, this is the best I can think of.
library(dplyr)
library(tidyr)
df1 <- df %>%
group_by(M) %>%
# convert column X and Z to a list column with each element named as
# X_1,X_2,... and Z_1,Z_2, ...
summarise(records = list(
append(
as.list(X) %>% setNames(paste0("X_",seq_along(X))),
as.list(Z) %>% setNames(paste0("Z_",seq_along(Z)))
))
) %>%
# when unnested, the name for a list element will be automated applied the the column name
unnest_wider(records)
> df1
# A tibble: 5 x 7
M X_1 X_2 X_3 Z_1 Z_2 Z_3
<chr> <int> <int> <int> <chr> <chr> <chr>
1 bam 12 14 10 B1 B4 B6
2 kar 13 17 NA B5 B1 NA
3 mmn 13 12 NA B7 B12 NA
4 sdr 11 NA NA B3 NA NA
5 zar 11 NA NA B8 NA NA
Here is one option by looping through the names
of the dataset except the first one, grouped by 'M', summarise
in a list
, use unnest_wider
, reduce
to a single data.frame by joining the elements of list
, then right_join
with the original dataset 'M' and reorder the columns of the dataset
library(purrr)
library(tidyr)
library(dplyr)
library(stringr)
map(names(df)[-1], ~ df %>%
dplyr::select(M, .x) %>%
group_by(M) %>%
summarise(!! .x := list(as.list(!! rlang::sym(.x)) %>%
set_names(str_c(.x, seq_along(.))))) %>%
unnest_wider(.x)) %>%
reduce(full_join, by = 'M') %>%
right_join(df1 %>%
dplyr::select(M)) %>%
dplyr::select(M, order(str_remove(names(.)[-1], "\\D+")) + 1)
# A tibble: 9 x 7
# M X1 Z1 X2 Z2 X3 Z3
# <fct> <int> <fct> <int> <fct> <int> <fct>
#1 bam 12 B1 14 B4 10 B6
#2 sdr 11 B3 NA <NA> NA <NA>
#3 kar 13 B5 17 B1 NA <NA>
#4 mmn 13 B7 12 B12 NA <NA>
#5 <NA> NA <NA> NA <NA> NA <NA>
#6 <NA> NA <NA> NA <NA> NA <NA>
#7 <NA> NA <NA> NA <NA> NA <NA>
#8 zar 11 B8 NA <NA> NA <NA>
#9 <NA> NA <NA> NA <NA> NA <NA>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.