简体   繁体   English

用“/”和“,”分隔多个列

[英]Separate Multiple Columns by "/" and ","

I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'.我正在清理一些数据,其中有多个列需要拆分为包含“,”和“/”的行。 Data.table below to explain what it the source code looks like.下面Data.table解释一下它的源代码是什么样子的。

df <- data.table(
   b = c("a", "d/e/f", "g,h"),
     c = c("1", "2,3,4", "5/6")
   )

I've tried using separate_rows, but it can only split one column on one of these separators at a time.我试过使用 separate_rows,但它一次只能在其中一个分隔符上拆分一列。

EDIT: The data.table I'm looking for looks approximately like this:编辑:我正在寻找的 data.table 看起来大致像这样:

df_clean <- data.table(
  b = c("a", "d", "d", "d", 
        "e", "e", "e", "f", 
        "f", "f", "g", "g",
        "h", "h"),
  c = c("1", "2", "3", "4",
        "2", "3", "4",
        "2", "3", "4",
        "5", "6", 
        "5", "6")
)

Updated answer based on added clarification.根据添加的说明更新了答案。

Run separate_rows once on each column to get all of the permutations.在每列上运行一次separate_rows以获得所有排列。 You can use a regex pattern to specify multiple separators.您可以使用正则表达式模式来指定多个分隔符。

library(tidyr)

df %>%
  separate_rows(b, sep = '/|,') %>%
  separate_rows(c, sep = '/|,')

#> # A tibble: 14 × 2
#>    b     c    
#>    <chr> <chr>
#>  1 a     1    
#>  2 d     2    
#>  3 d     3    
#>  4 d     4    
#>  5 e     2    
#>  6 e     3    
#>  7 e     4    
#>  8 f     2    
#>  9 f     3    
#> 10 f     4    
#> 11 g     5    
#> 12 g     6    
#> 13 h     5    
#> 14 h     6

Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]也许这有帮助:[https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]

for the first column:对于第一列:

s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))

An option with cSplit cSplit的一个选项

library(splitstackshape)
cSplit(df, "b", sep = "/|,", "long", fixed = FALSE) |> 
   cSplit("c", sep = "/|,", "long", fixed = FALSE)

-output -输出

    b c
 1: a 1
 2: d 2
 3: d 3
 4: d 4
 5: e 2
 6: e 3
 7: e 4
 8: f 2
 9: f 3
10: f 4
11: g 5
12: g 6
13: h 5
14: h 6

A data.table options: A data.table选项:

# option 1
foo = \(x) unlist(strsplit(x, ",|/"))
df[, do.call(CJ, lapply(.SD, foo)), .I][, !"I"]

Similarly in base R:同样在基数 R 中:

sep = ",|/"
Map(
  expand.grid,
  strsplit(df$b, sep),
  strsplit(df$c, sep)
) |> 
  do.call(rbind, args = _)

Result结果

#          b      c
#     <char> <char>
#  1:      a      1
#  2:      d      2
#  3:      d      3
#  4:      d      4
#  5:      e      2
#  6:      e      3
#  7:      e      4
#  8:      f      2
#  9:      f      3
# 10:      f      4
# 11:      g      5
# 12:      g      6
# 13:      h      5
# 14:      h      6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM