[英]Moving replicated data in to the next columns in R
my data are as follows:我的数据如下:
df <- read.table(text = "MXZ 'bam' 12 'B1' 'sdr' 11 'B3' 'kar' 13 'B5' 'mmn' 13 'B7' 'bam' 14 'B4' 'kar' 17 'B1' 'bam' 10 'B6' 'zar' 11 'B8' 'mmn' 12 'B12' ", header = TRUE)
I want to move the replicated data into the next column.我想将复制的数据移动到下一列。 Considering "bam", it appears three times.
考虑到“bam”,它出现了三遍。 Now, I want to move it to the next column, where it appears for the first time, other replicated data will appear in the other columns.
现在,我想将它移到下一列,它第一次出现的地方,其他复制的数据将出现在其他列中。 When the replicated data are moved to other columns, they will be removed from the columns to get the following tables:
当复制的数据被移动到其他列时,它们将从列中删除,得到以下表:
df <- read.table(text = " MXZ X1 Z1 X2 Z2 'bam' 12 'B1' 14 'B4' 10 'B6' 'sdr' 11 'B3' NA NA NA NA 'kar' 13 'B5' 17 'B1' NA NA 'mmn' 13 'B7' 12 'B12' NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 'zar' 11 'B8' NA NA NA NA NA NA NA NA NA NA NA ", header = TRUE) > df
I understand that I need to give my solution, but I was unable to find out a solution.我知道我需要提供我的解决方案,但我无法找到解决方案。
One possible way is to use unnest_wider()
in tidyr
.一种可能的方法是在
tidyr
中使用unnest_wider()
。 When unnested, the name for a list item will be automatically used as the column name.取消嵌套时,列表项的名称将自动用作列名。
I believe there is better way of constructing the records
list, but currently, this is the best I can think of.我相信有更好的方法来构建
records
列表,但目前,这是我能想到的最好的方法。
library(dplyr)
library(tidyr)
df1 <- df %>%
group_by(M) %>%
# convert column X and Z to a list column with each element named as
# X_1,X_2,... and Z_1,Z_2, ...
summarise(records = list(
append(
as.list(X) %>% setNames(paste0("X_",seq_along(X))),
as.list(Z) %>% setNames(paste0("Z_",seq_along(Z)))
))
) %>%
# when unnested, the name for a list element will be automated applied the the column name
unnest_wider(records)
> df1
# A tibble: 5 x 7
M X_1 X_2 X_3 Z_1 Z_2 Z_3
<chr> <int> <int> <int> <chr> <chr> <chr>
1 bam 12 14 10 B1 B4 B6
2 kar 13 17 NA B5 B1 NA
3 mmn 13 12 NA B7 B12 NA
4 sdr 11 NA NA B3 NA NA
5 zar 11 NA NA B8 NA NA
Here is one option by looping through the names
of the dataset except the first one, grouped by 'M', summarise
in a list
, use unnest_wider
, reduce
to a single data.frame by joining the elements of list
, then right_join
with the original dataset 'M' and reorder the columns of the dataset这是一个选项,通过循环遍历数据集的
names
,除了第一个,按“M”分组, summarise
在一个list
,使用unnest_wider
,通过连接list
的元素reduce
到单个 data.frame,然后right_join
与原始数据集'M'并重新排序数据集的列
library(purrr)
library(tidyr)
library(dplyr)
library(stringr)
map(names(df)[-1], ~ df %>%
dplyr::select(M, .x) %>%
group_by(M) %>%
summarise(!! .x := list(as.list(!! rlang::sym(.x)) %>%
set_names(str_c(.x, seq_along(.))))) %>%
unnest_wider(.x)) %>%
reduce(full_join, by = 'M') %>%
right_join(df1 %>%
dplyr::select(M)) %>%
dplyr::select(M, order(str_remove(names(.)[-1], "\\D+")) + 1)
# A tibble: 9 x 7
# M X1 Z1 X2 Z2 X3 Z3
# <fct> <int> <fct> <int> <fct> <int> <fct>
#1 bam 12 B1 14 B4 10 B6
#2 sdr 11 B3 NA <NA> NA <NA>
#3 kar 13 B5 17 B1 NA <NA>
#4 mmn 13 B7 12 B12 NA <NA>
#5 <NA> NA <NA> NA <NA> NA <NA>
#6 <NA> NA <NA> NA <NA> NA <NA>
#7 <NA> NA <NA> NA <NA> NA <NA>
#8 zar 11 B8 NA <NA> NA <NA>
#9 <NA> NA <NA> NA <NA> NA <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.