如何将数字（包括点小数点分隔符）与 `tidyr::separate` 正则表达式中的字母分开？

Question

How to separate numbers (including dot decimal separator) from letters in tidyr::separate regex?如何将数字（包括点小数点分隔符）与tidyr::separate正则表达式中的字母分开？ In my current attempts, it seems the first letter of the second string is getting chopped off.在我目前的尝试中，似乎第二个字符串的第一个字母被砍掉了。

Reprex:代表：

df <- data.frame(x = c("24.1234AAA", "14.4321BBB"))
df
#>            x
#> 1 24.1234AAA
#> 2 14.4321BBB

# This works but it is missing the first letter of the string
tidyr::separate(df, x, c("part1", "part2"), sep = "[^0-9 | {.}]", extra = "merge", convert = TRUE)
#>     part1 part2
#> 1 24.1234    AA
#> 2 14.4321    BB

# This gets the letter string completely, but not the numbers
tidyr::separate(df, x, c("part1", "part2"), sep = "([0-9.]+)", extra = "merge", convert = TRUE)
#>   part1 part2
#> 1    NA   AAA
#> 2    NA   BBB

^{Created on 2022-12-31 with reprex v2.0.2}^{创建于 2022-12-31，使用reprex v2.0.2}

Note: the numbers and letters are not always the same length so we cannot use a numeric vector for the sep argument of tidyr::separate .注意：数字和字母的长度并不总是相同，因此我们不能将数字向量用于tidyr::separate的sep参数。

Answer 1

Use a regex lookaround to split between the digit ( \\d ) and letter ( [AZ] )使用正则表达式环视在数字 ( \\d ) 和字母 ( [AZ] ) 之间拆分

tidyr::separate(df, x, c("part1", "part2"), 
    sep = "(?<=\\d)(?=[A-Z])", extra = "merge", convert = TRUE)

-output -输出

    part1 part2
1 24.1234   AAA
2 14.4321   BBB

Or use extract with capture groups或者将extract与捕获组一起使用

tidyr::extract(df, x, c("part1", "part2"), "^([0-9.]+)(\\D+)", convert = TRUE)
    part1 part2
1 24.1234   AAA
2 14.4321   BBB

如何将数字（包括点小数点分隔符）与 `tidyr::separate` 正则表达式中的字母分开？

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-12-31 20:13:37

如何将数字（包括点小数点分隔符）与 `tidyr::separate` 正则表达式中的字母分开？

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-12-31 20:13:37

解决方案1
4 已采纳 2022-12-31 20:13:37