简体   繁体   English

如何将数字(包括点小数点分隔符)与 `tidyr::separate` 正则表达式中的字母分开?

[英]How to separate numbers (including dot decimal separator) from letters in `tidyr::separate` regex?

How to separate numbers (including dot decimal separator) from letters in tidyr::separate regex?如何将数字(包括点小数点分隔符)与tidyr::separate正则表达式中的字母分开? In my current attempts, it seems the first letter of the second string is getting chopped off.在我目前的尝试中,似乎第二个字符串的第一个字母被砍掉了。

Reprex:代表:

df <- data.frame(x = c("24.1234AAA", "14.4321BBB"))
df
#>            x
#> 1 24.1234AAA
#> 2 14.4321BBB

# This works but it is missing the first letter of the string
tidyr::separate(df, x, c("part1", "part2"), sep = "[^0-9 | {.}]", extra = "merge", convert = TRUE)
#>     part1 part2
#> 1 24.1234    AA
#> 2 14.4321    BB

# This gets the letter string completely, but not the numbers
tidyr::separate(df, x, c("part1", "part2"), sep = "([0-9.]+)", extra = "merge", convert = TRUE)
#>   part1 part2
#> 1    NA   AAA
#> 2    NA   BBB

Created on 2022-12-31 with reprex v2.0.2创建于 2022-12-31,使用reprex v2.0.2

Note: the numbers and letters are not always the same length so we cannot use a numeric vector for the sep argument of tidyr::separate .注意:数字和字母的长度并不总是相同,因此我们不能将数字向量用于tidyr::separatesep参数。

Use a regex lookaround to split between the digit ( \\d ) and letter ( [AZ] )使用正则表达式环视在数字 ( \\d ) 和字母 ( [AZ] ) 之间拆分

tidyr::separate(df, x, c("part1", "part2"), 
    sep = "(?<=\\d)(?=[A-Z])", extra = "merge", convert = TRUE)

-output -输出

    part1 part2
1 24.1234   AAA
2 14.4321   BBB

Or use extract with capture groups或者将extract与捕获组一起使用

tidyr::extract(df, x, c("part1", "part2"), "^([0-9.]+)(\\D+)", convert = TRUE)
    part1 part2
1 24.1234   AAA
2 14.4321   BBB

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM