简体   繁体   English

在斜杠分隔的一串数字中去掉数字中position n处的一个数字

[英]Remove one number at position n of the number in a string of numbers separated by slashes

I have a character column with this configuration:我有一个具有此配置的字符列:

data <- data.frame(
  id = 1:3,
  codes = c("08001301001", "08002401002 / 08002601003 / 17134604034", "08004701005 / 08005101001"))

I want to remove the 6th digit of any number within the string.我想删除字符串中任意数字的第 6 位。 The numbers are always 10 characters long.数字始终为 10 个字符长。

My code works.我的代码有效。 However I believe it might be done easier using RegEx, but I couldn't figure it out.但是我相信使用 RegEx 可能会更容易,但我无法弄清楚。

library(stringr)

remove_6_digit <- function(x){
  idxs <- str_locate_all(x,"/")[[1]][,1]
  
  for (idx in c(rev(idxs+7), 6)){
      str_sub(x, idx, idx) <- ""      
  }
  return(x)
}

result <- sapply(data$codes, remove_6_digit, USE.NAMES = F)

You can use您可以使用

gsub("\\b(\\d{5})\\d", "\\1", data$codes)

See the regex demo .请参阅正则表达式演示 This will remove the 6th digit from the start of a digit sequence.这将从数字序列的开头删除第 6 位。

Details :详情

  • \b - word boundary \b - 单词边界
  • (\d{5}) - Capturing group 1 ( \1 ): five digits (\d{5}) - 捕获组 1 ( \1 ):五位数
  • \d - a digit. \d - 一个数字。

While word boundary looks enough for the current scenario, a digit boundary is also an option in case the numbers are glued to word chars:虽然单词边界对于当前场景来说已经足够了,但数字边界也是一种选择,以防数字粘附到单词字符上:

gsub("(?<!\\d)(\\d{5})\\d", "\\1", data$codes, perl=TRUE)

where perl=TRUE enables the PCRE regex syntax and (?<!\d) is a negative lookbehind that fails the match if there is a digit immediately to the left of the current location.其中perl=TRUE启用 PCRE 正则表达式语法,并且(?<!\d)是一个负向后视,如果当前位置的左侧紧邻有一个数字,则匹配失败。

And if you must only change numeric char sequences of 10 digits (no shorter and no longer) you can use如果您必须只更改 10 位数字字符序列(不再更短),您可以使用

gsub("\\b(\\d{5})\\d(\\d{4})\\b", "\\1\\2", data$codes)
gsub("(?<!\\d)(\\d{5})\\d(?=\\d{4}(?!\\d))", "\\1", data$codes, perl=TRUE)

One remark though: your numbers consist of 11 digits, so you need to replace \\d{4} with \\d{5} , see this regex demo .不过请注意:您的号码由 11 位数字组成,因此您需要将\\d{4}替换为\\d{5} ,请参阅此正则表达式演示

Another possible solution, using stringr::str_replace_all and lookaround:另一种可能的解决方案,使用stringr::str_replace_all和 lookaround:

library(tidyverse)

data %>% 
  mutate(codes = str_replace_all(codes, "(?<=\\d{5})\\d(?=\\d{5})", ""))

#>   id                                codes
#> 1  1                           0800101001
#> 2  2 0800201002 / 0800201003 / 1713404034
#> 3  3              0800401005 / 0800501001

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM