在斜杠分隔的一串数字中去掉数字中position n处的一个数字

Question

I have a character column with this configuration:我有一个具有此配置的字符列：

data <- data.frame(
  id = 1:3,
  codes = c("08001301001", "08002401002 / 08002601003 / 17134604034", "08004701005 / 08005101001"))

I want to remove the 6th digit of any number within the string.我想删除字符串中任意数字的第 6 位。 The numbers are always 10 characters long.数字始终为 10 个字符长。

My code works.我的代码有效。 However I believe it might be done easier using RegEx, but I couldn't figure it out.但是我相信使用 RegEx 可能会更容易，但我无法弄清楚。

library(stringr)

remove_6_digit <- function(x){
  idxs <- str_locate_all(x,"/")[[1]][,1]
  
  for (idx in c(rev(idxs+7), 6)){
      str_sub(x, idx, idx) <- ""      
  }
  return(x)
}

result <- sapply(data$codes, remove_6_digit, USE.NAMES = F)

Answer 1

You can use您可以使用

gsub("\\b(\\d{5})\\d", "\\1", data$codes)

See the regex demo .请参阅正则表达式演示。 This will remove the 6th digit from the start of a digit sequence.这将从数字序列的开头删除第 6 位。

Details :详情：

\b - word boundary \b - 单词边界
(\d{5}) - Capturing group 1 ( \1 ): five digits (\d{5}) - 捕获组 1 ( \1 )：五位数
\d - a digit. \d - 一个数字。

While word boundary looks enough for the current scenario, a digit boundary is also an option in case the numbers are glued to word chars:虽然单词边界对于当前场景来说已经足够了，但数字边界也是一种选择，以防数字粘附到单词字符上：

gsub("(?<!\\d)(\\d{5})\\d", "\\1", data$codes, perl=TRUE)

where perl=TRUE enables the PCRE regex syntax and (?<!\d) is a negative lookbehind that fails the match if there is a digit immediately to the left of the current location.其中perl=TRUE启用 PCRE 正则表达式语法，并且(?<!\d)是一个负向后视，如果当前位置的左侧紧邻有一个数字，则匹配失败。

And if you must only change numeric char sequences of 10 digits (no shorter and no longer) you can use如果您必须只更改 10 位数字字符序列（不再更短），您可以使用

gsub("\\b(\\d{5})\\d(\\d{4})\\b", "\\1\\2", data$codes)
gsub("(?<!\\d)(\\d{5})\\d(?=\\d{4}(?!\\d))", "\\1", data$codes, perl=TRUE)

One remark though: your numbers consist of 11 digits, so you need to replace \\d{4} with \\d{5} , see this regex demo .不过请注意：您的号码由 11 位数字组成，因此您需要将\\d{4}替换为\\d{5} ，请参阅此正则表达式演示。

Answer 2

Another possible solution, using stringr::str_replace_all and lookaround:另一种可能的解决方案，使用stringr::str_replace_all和 lookaround：

library(tidyverse)

data %>% 
  mutate(codes = str_replace_all(codes, "(?<=\\d{5})\\d(?=\\d{5})", ""))

#>   id                                codes
#> 1  1                           0800101001
#> 2  2 0800201002 / 0800201003 / 1713404034
#> 3  3              0800401005 / 0800501001

在斜杠分隔的一串数字中去掉数字中position n处的一个数字

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-03-04 10:10:33

解决方案2
3 2022-03-04 10:22:02

在斜杠分隔的一串数字中去掉数字中position n处的一个数字

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-03-04 10:10:33

解决方案2 3 2022-03-04 10:22:02

解决方案1
3 已采纳 2022-03-04 10:10:33

解决方案2
3 2022-03-04 10:22:02