简体   繁体   English

根据R中的分隔符将单列转换为多列

[英]Convert a single column into multiple columns based on delimiter in R

I have the following dataframe:我有以下 dataframe:

ID Parts
-- -----
1  A:B::
2  X2:::
3  ::J4:
4  A:C:D:G4:X6

And I would like the convert the Parts column into multiple columns by the : delimiter.我希望通过:分隔符将 Parts 列转换为多列。 so it should look like:所以它应该看起来像:

ID A  B  X2  J4  C  D  G4  X6 ........
-- -  -  --  --  -  -  --  -- 
1  A  B  na  na  na na na  na
2  na na X2  na  na na na  na
3  na na na  J4  na na na  na
4  A  na na  na  C  D  G4  X6

where there I would not know the number of potential columns in advance.在那里我不会提前知道潜在列的数量。

I have met my match on this one - strsplit() by delim I can do but only with fixed number of entities in the Parts column我在这个上遇到了我的匹配 - strsplit() by delim 我可以做到,但只能在Parts列中使用固定数量的实体

You can use a combination of tidyr::seperate , tidyr::pivot_wider , and tidyr::pivot_longer .您可以使用tidyr::seperate tidyr::pivot_widertidyr::pivot_longer的组合。 First you can still use strsplit to determine the number of columns to split Parts into not the number of unique values ( How it works ):首先,您仍然可以使用strsplit来确定将Parts Parts为唯一值的数量而不是唯一值的数量(它是如何工作的):

library(dplyr)
library(tidyr)
library(stringr)

n_col <- max(stringr::str_count(df$Parts, ":")) + 1

df %>% 
  tidyr::separate(Parts, into = paste0("col", 1:n_col), sep = ":") %>% 
  dplyr::mutate(across(everything(), ~dplyr::na_if(., ""))) %>% 
  tidyr::pivot_longer(-ID) %>% 
  dplyr::select(-name) %>% 
  tidyr::drop_na() %>% 
  tidyr::pivot_wider(id_cols = ID,
                     names_from = value)


     ID A     B     X2    J4    C     D     G4    X6   
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1     1 A     B     NA    NA    NA    NA    NA    NA   
2     2 NA    NA    X2    NA    NA    NA    NA    NA   
3     3 NA    NA    NA    J4    NA    NA    NA    NA   
4     4 A     NA    NA    NA    C     D     G4    X6 

How it works这个怎么运作

You do not need to know the number of unique values with this code -- the pivots take care of that.使用此代码,您无需知道唯一值的数量——枢轴会处理这些。 What you do need to know is how many new columns Parts will be split into with seperate .您需要知道的是Parts将拆分为多少个新列seperate That's easy to do by counting the number of delimiters and adding one with str_count .这很容易通过计算分隔符的数量并用str_count加一来实现。 This way you have the appropriate number of columns to seperate Parts into by your delimiter.这样,您就有了适当数量的列,可以通过分隔符将Parts分开。

This is because pivot_longer will create a two column dataframe with repeated ID and a column with the delimited values of Parts -- an ID , Parts pairing.这是因为pivot_longer将创建一个包含重复ID的两列 dataframe 和一个带有Parts分隔值的列——一个IDParts配对。 Then when you use pivot_wider the columns are automatically created for each unique value of Parts and the value is retained within the column.然后,当您使用pivot_wider时,将为Parts的每个唯一值自动创建列,并且该值保留在列中。 This function automatically fills with NA where an ID and Parts combination is not found.此 function 在未找到IDParts组合的情况下自动填充NA

Try running this pipe by pipe to better understand if need be.尝试运行此 pipe 的 pipe 以更好地了解是否需要。


Data数据

lines <- "
ID Parts
1  A:B::
2  X2:::
3  ::J4:
4  A:C:D:G4:X6
"

df <- read.table(text = lines, header = T)

Could the seperate function from tidyr be what you are looking for?来自tidyrseperate function 是否是您正在寻找的?

https://tidyr.tidyverse.org/reference/separate.html https://tidyr.tidyverse.org/reference/separate.html

It might require some fancy regex implementation, but could potentially work.它可能需要一些花哨的正则表达式实现,但可能会起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM