用“/”和“,”分隔多个列

Question

I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'.我正在清理一些数据，其中有多个列需要拆分为包含“，”和“/”的行。 Data.table below to explain what it the source code looks like.下面Data.table解释一下它的源代码是什么样子的。

df <- data.table(
   b = c("a", "d/e/f", "g,h"),
     c = c("1", "2,3,4", "5/6")
   )

I've tried using separate_rows, but it can only split one column on one of these separators at a time.我试过使用 separate_rows，但它一次只能在其中一个分隔符上拆分一列。

EDIT: The data.table I'm looking for looks approximately like this:编辑：我正在寻找的 data.table 看起来大致像这样：

df_clean <- data.table(
  b = c("a", "d", "d", "d", 
        "e", "e", "e", "f", 
        "f", "f", "g", "g",
        "h", "h"),
  c = c("1", "2", "3", "4",
        "2", "3", "4",
        "2", "3", "4",
        "5", "6", 
        "5", "6")
)

Answer 1

Updated answer based on added clarification.根据添加的说明更新了答案。

Run separate_rows once on each column to get all of the permutations.在每列上运行一次separate_rows以获得所有排列。 You can use a regex pattern to specify multiple separators.您可以使用正则表达式模式来指定多个分隔符。

library(tidyr)

df %>%
  separate_rows(b, sep = '/|,') %>%
  separate_rows(c, sep = '/|,')

#> # A tibble: 14 × 2
#>    b     c    
#>    <chr> <chr>
#>  1 a     1    
#>  2 d     2    
#>  3 d     3    
#>  4 d     4    
#>  5 e     2    
#>  6 e     3    
#>  7 e     4    
#>  8 f     2    
#>  9 f     3    
#> 10 f     4    
#> 11 g     5    
#> 12 g     6    
#> 13 h     5    
#> 14 h     6

Answer 2

Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]也许这有帮助：[https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]

for the first column:对于第一列：

s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))

Answer 3

An option with cSplit cSplit的一个选项

library(splitstackshape)
cSplit(df, "b", sep = "/|,", "long", fixed = FALSE) |> 
   cSplit("c", sep = "/|,", "long", fixed = FALSE)

-output -输出

    b c
 1: a 1
 2: d 2
 3: d 3
 4: d 4
 5: e 2
 6: e 3
 7: e 4
 8: f 2
 9: f 3
10: f 4
11: g 5
12: g 6
13: h 5
14: h 6

Answer 4

A data.table options: A data.table选项：

# option 1
foo = \(x) unlist(strsplit(x, ",|/"))
df[, do.call(CJ, lapply(.SD, foo)), .I][, !"I"]

Similarly in base R:同样在基数 R 中：

sep = ",|/"
Map(
  expand.grid,
  strsplit(df$b, sep),
  strsplit(df$c, sep)
) |> 
  do.call(rbind, args = _)

Result结果

#          b      c
#     <char> <char>
#  1:      a      1
#  2:      d      2
#  3:      d      3
#  4:      d      4
#  5:      e      2
#  6:      e      3
#  7:      e      4
#  8:      f      2
#  9:      f      3
# 10:      f      4
# 11:      g      5
# 12:      g      6
# 13:      h      5
# 14:      h      6

用“/”和“,”分隔多个列

问题描述

4 个解决方案

解决方案1
3 2022-11-30 11:10:35

解决方案2
2 2022-11-30 11:12:43

解决方案3
1 2022-11-30 16:44:39

解决方案4
1 2022-11-30 17:40:44

用“/”和“,”分隔多个列

问题描述

4 个解决方案

解决方案1 3 2022-11-30 11:10:35

解决方案2 2 2022-11-30 11:12:43

解决方案3 1 2022-11-30 16:44:39

解决方案4 1 2022-11-30 17:40:44

解决方案1
3 2022-11-30 11:10:35

解决方案2
2 2022-11-30 11:12:43

解决方案3
1 2022-11-30 16:44:39

解决方案4
1 2022-11-30 17:40:44