根据R中的分隔符将单列转换为多列

Question

I have the following dataframe:我有以下 dataframe：

ID Parts
-- -----
1  A:B::
2  X2:::
3  ::J4:
4  A:C:D:G4:X6

And I would like the convert the Parts column into multiple columns by the : delimiter.我希望通过:分隔符将 Parts 列转换为多列。 so it should look like:所以它应该看起来像：

ID A  B  X2  J4  C  D  G4  X6 ........
-- -  -  --  --  -  -  --  -- 
1  A  B  na  na  na na na  na
2  na na X2  na  na na na  na
3  na na na  J4  na na na  na
4  A  na na  na  C  D  G4  X6

where there I would not know the number of potential columns in advance.在那里我不会提前知道潜在列的数量。

I have met my match on this one - strsplit() by delim I can do but only with fixed number of entities in the Parts column我在这个上遇到了我的匹配 - strsplit() by delim 我可以做到，但只能在Parts列中使用固定数量的实体

Answer 1

You can use a combination of tidyr::seperate , tidyr::pivot_wider , and tidyr::pivot_longer .您可以使用tidyr::seperate tidyr::pivot_wider和tidyr::pivot_longer的组合。 First you can still use strsplit to determine the number of columns to split Parts into not the number of unique values ( How it works ):首先，您仍然可以使用strsplit来确定将Parts Parts为唯一值的数量而不是唯一值的数量（它是如何工作的）：

library(dplyr)
library(tidyr)
library(stringr)

n_col <- max(stringr::str_count(df$Parts, ":")) + 1

df %>% 
  tidyr::separate(Parts, into = paste0("col", 1:n_col), sep = ":") %>% 
  dplyr::mutate(across(everything(), ~dplyr::na_if(., ""))) %>% 
  tidyr::pivot_longer(-ID) %>% 
  dplyr::select(-name) %>% 
  tidyr::drop_na() %>% 
  tidyr::pivot_wider(id_cols = ID,
                     names_from = value)


     ID A     B     X2    J4    C     D     G4    X6   
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1     1 A     B     NA    NA    NA    NA    NA    NA   
2     2 NA    NA    X2    NA    NA    NA    NA    NA   
3     3 NA    NA    NA    J4    NA    NA    NA    NA   
4     4 A     NA    NA    NA    C     D     G4    X6

How it works这个怎么运作

You do not need to know the number of unique values with this code -- the pivots take care of that.使用此代码，您无需知道唯一值的数量——枢轴会处理这些。 What you do need to know is how many new columns Parts will be split into with seperate .您需要知道的是Parts将拆分为多少个新列seperate 。 That's easy to do by counting the number of delimiters and adding one with str_count .这很容易通过计算分隔符的数量并用str_count加一来实现。 This way you have the appropriate number of columns to seperate Parts into by your delimiter.这样，您就有了适当数量的列，可以通过分隔符将Parts分开。

This is because pivot_longer will create a two column dataframe with repeated ID and a column with the delimited values of Parts -- an ID , Parts pairing.这是因为pivot_longer将创建一个包含重复ID的两列 dataframe 和一个带有Parts分隔值的列——一个ID ， Parts配对。 Then when you use pivot_wider the columns are automatically created for each unique value of Parts and the value is retained within the column.然后，当您使用pivot_wider时，将为Parts的每个唯一值自动创建列，并且该值保留在列中。 This function automatically fills with NA where an ID and Parts combination is not found.此 function 在未找到ID和Parts组合的情况下自动填充NA 。

Try running this pipe by pipe to better understand if need be.尝试运行此 pipe 的 pipe 以更好地了解是否需要。

Data数据

lines <- "
ID Parts
1  A:B::
2  X2:::
3  ::J4:
4  A:C:D:G4:X6
"

df <- read.table(text = lines, header = T)

Answer 2

Could the seperate function from tidyr be what you are looking for?来自tidyr的seperate function 是否是您正在寻找的？

https://tidyr.tidyverse.org/reference/separate.html https://tidyr.tidyverse.org/reference/separate.html

It might require some fancy regex implementation, but could potentially work.它可能需要一些花哨的正则表达式实现，但可能会起作用。

根据R中的分隔符将单列转换为多列

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-03 17:32:06

解决方案2
0 2021-02-03 16:36:15

根据R中的分隔符将单列转换为多列

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-03 17:32:06

解决方案2 0 2021-02-03 16:36:15

解决方案1
1 已采纳 2021-02-03 17:32:06

解决方案2
0 2021-02-03 16:36:15