简体   繁体   English

使用正则表达式,在 r 中找到匹配项后如何添加元素?

[英]Using regular expression, how can I add elements after I find a match in r?

I have a column that has a string of numbers-length range 10 and 11. This is an example of some of the values in the column:我有一列具有数字长度范围为 10 和 11 的字符串。这是列中一些值的示例:

column=c("5699420001","00409226602")

How can I place a hyphen after the first four digits (in the strings with 10 characters) and after the first five digits (in the strings with 11 characters), as well as after the second four digits for both lengths?如何在前四位数字之后(在 10 个字符的字符串中)和前五位数字之后(在具有 11 个字符的字符串中)以及两个长度的后四位数字之后放置连字符? Output is provided below. Output 在下面提供。 I wanted to use stringr for this.我想为此使用stringr

column_standard=c("5699-4200-01","00409-2266-02")

try using this as your expression:尝试使用这个作为你的表达:

\b(\d{4,5})(\d{4})(\d{2}\b)

It sets up three capture groups that you can later use in your replacement to easily add hyphens between them.它设置了三个捕获组,您以后可以在替换中使用它们来轻松地在它们之间添加连字符。

Then you just replace with:然后你只需替换为:

\1-\2-\3

Thanks to @Dunois for pointing out how it would look in code:感谢@Dunois 指出它在代码中的外观:

column_standard <- sapply(column, function(x) stringr::str_replace(x, "^(\\d{4,5})(\\d{4})(\\d{2})", "\\1\\-\\2-\\3"))

Here is a live example .这是一个活生生的例子

Here's a solution using capture groups with stringr 's str_replace() function:这是使用带有stringrstr_replace() function 的捕获组的解决方案:

library(stringr)

column <- c("5699420001","00409226602")

column_standard <- sapply(column, function(x){
  ifelse(nchar(x) == 11, 
         stringr::str_replace(x, "^([0-9]{5})([0-9]{4})(.*)", "\\1\\-\\2-\\3"),
         stringr::str_replace(x, "^([0-9]{4})([0-9]{4})(.*)", "\\1\\-\\2-\\3"))
})

column_standard

#     5699420001     00409226602 
# "5699-4200-01" "00409-2266-02"

The code should be fairly self-explanatory.代码应该是不言自明的。 I can provide a detailed explanation upon request.我可以根据要求提供详细的解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM