[英]Separate Multiple Columns by "/" and ","
I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'.我正在清理一些数据,其中有多个列需要拆分为包含“,”和“/”的行。 Data.table below to explain what it the source code looks like.下面Data.table解释一下它的源代码是什么样子的。
df <- data.table(
b = c("a", "d/e/f", "g,h"),
c = c("1", "2,3,4", "5/6")
)
I've tried using separate_rows, but it can only split one column on one of these separators at a time.我试过使用 separate_rows,但它一次只能在其中一个分隔符上拆分一列。
EDIT: The data.table I'm looking for looks approximately like this:编辑:我正在寻找的 data.table 看起来大致像这样:
df_clean <- data.table(
b = c("a", "d", "d", "d",
"e", "e", "e", "f",
"f", "f", "g", "g",
"h", "h"),
c = c("1", "2", "3", "4",
"2", "3", "4",
"2", "3", "4",
"5", "6",
"5", "6")
)
Updated answer based on added clarification.根据添加的说明更新了答案。
Run separate_rows
once on each column to get all of the permutations.在每列上运行一次separate_rows
以获得所有排列。 You can use a regex pattern to specify multiple separators.您可以使用正则表达式模式来指定多个分隔符。
library(tidyr)
df %>%
separate_rows(b, sep = '/|,') %>%
separate_rows(c, sep = '/|,')
#> # A tibble: 14 × 2
#> b c
#> <chr> <chr>
#> 1 a 1
#> 2 d 2
#> 3 d 3
#> 4 d 4
#> 5 e 2
#> 6 e 3
#> 7 e 4
#> 8 f 2
#> 9 f 3
#> 10 f 4
#> 11 g 5
#> 12 g 6
#> 13 h 5
#> 14 h 6
Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]也许这有帮助:[https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]
for the first column:对于第一列:
s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))
An option with cSplit
cSplit
的一个选项
library(splitstackshape)
cSplit(df, "b", sep = "/|,", "long", fixed = FALSE) |>
cSplit("c", sep = "/|,", "long", fixed = FALSE)
-output -输出
b c
1: a 1
2: d 2
3: d 3
4: d 4
5: e 2
6: e 3
7: e 4
8: f 2
9: f 3
10: f 4
11: g 5
12: g 6
13: h 5
14: h 6
A data.table
options: A data.table
选项:
# option 1
foo = \(x) unlist(strsplit(x, ",|/"))
df[, do.call(CJ, lapply(.SD, foo)), .I][, !"I"]
Similarly in base R:同样在基数 R 中:
sep = ",|/"
Map(
expand.grid,
strsplit(df$b, sep),
strsplit(df$c, sep)
) |>
do.call(rbind, args = _)
Result结果
# b c
# <char> <char>
# 1: a 1
# 2: d 2
# 3: d 3
# 4: d 4
# 5: e 2
# 6: e 3
# 7: e 4
# 8: f 2
# 9: f 3
# 10: f 4
# 11: g 5
# 12: g 6
# 13: h 5
# 14: h 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.