如何使用 R 从字符串中间删除两位数字（01、02 等）的前导零？

Question

For the following string vector s , I hope to remove leading zeros in each elements, which is reverse of the answer from this link :对于以下字符串向量s ，我希望删除每个元素中的前导零，这与此链接的答案相反：

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')

The expected result will like:预期的结果如下：

s <- c('week 1st', 'weeks 2nd', 'year2022week1st', 'week 4th')

I test the following code, it's not working out since the regex syntax is not complete:我测试了以下代码，由于正则表达式语法不完整，因此无法正常工作：

s <- 'week 01st'
sub('^0+(?=[1-9])', '', s, perl=TRUE)
sub('^0+([1-9])', '\\1', s)

Out:出去：

[1] "week 01st"

How could I do that using R?我怎么能用 R 做到这一点？

Update: for the following code contributed by @dvantwisk, it works for year2022week01st , but not suitable to other elements:更新：对于@dvantwisk 贡献的以下代码，它适用于year2022week01st ，但不适用于其他元素：

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub('(year[0-9]{4,})(week)(0{0,})([1-9]{1})([0-9a-zA-Z]{1,})', '\\1\\2\\4\\5', s)

Out:出去：

[1] "week 01st"       "weeks 02nd"      "year2022week1st" "week 4th"

Answer 1

You might use:您可能会使用：

weeks?\h*\K0+(?=[1-9]\d*[a-zA-Z])

The pattern matches:模式匹配：

weeks? Match week with optional s匹配周与可选 s
\h*\K Match optional spaces and forget what is matched so far \h*\K匹配可选空格并忘记到目前为止匹配的内容
0+ Match 1+ times a zero 0+匹配 1+ 次零
(?=[1-9]\d*[a-zA-Z]) Positive lookahead, assert a char 1-9, optional digit and a char a-zA-Z to the right (?=[1-9]\d*[a-zA-Z])正向前瞻，断言一个字符 1-9，可选数字和一个字符 a-zA-Z 到右边

See a Regex demo and a R demo .请参阅Regex 演示和R 演示。

In the replacement use an empty string.在替换中使用空字符串。

For example例如

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("weeks?\\h*\\K0+(?=[1-9]\\d*[a-zA-Z])", '', s, perl=T)

Output Output

[1] "week 1st"        "weeks 2nd"       "year2022week1st" "week 4th"

Or with 2 capture groups:或使用 2 个捕获组：

(weeks?\h*)0+([1-9]\d*[a-zA-Z])

Example:例子：

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("(weeks?\\h*)0+([1-9]\\d*[a-zA-Z])", '\\1\\2', s,)

Output Output

[1] "week 01st"       "weeks 02nd"      "year2022week1st" "week 4th"

Answer 2

gsub('(week )(0{0,})([1-9]{1})([0-9a-zA-Z]{1,})', '\\1\\3\\4', week_string)

gsub() takes three arguments as input: a pattern, a replacement, and a query character vector. gsub()将三个 arguments 作为输入：一个模式、一个替换和一个查询字符向量。 Our strategy is to create a regular expression with four groups with () s.我们的策略是使用()创建一个包含四个组的正则表达式。

We fist match 'week '.我们拳头比赛'周'。

We then match zero or more zeros with the expression (0{0,}) .然后我们将零个或多个零与表达式(0{0,})匹配。 The first zero indicates the character we are trying to match and the expression {0,} indicates we are trying to match zero (hence the 0) or more (hence the comma) times.第一个零表示我们尝试匹配的字符，表达式{0,}表示我们尝试匹配零次（因此是 0）或更多次（因此是逗号）。

Our third group is matching any number between 1 to 9 one time.我们的第三组匹配一次 1 到 9 之间的任何数字。

Out fourth group is to match any number between 0 to 9 or any letter 1 or more times第四组是匹配 0 到 9 之间的任何数字或任何字母 1 次或多次

Our replacement is '\\1\\3\\4' .我们的替代品是'\\1\\3\\4' 。 This indicates we only want to keep group one and three in our result.这表明我们只想在结果中保留第一组和第三组。 Thus the output is:因此 output 是：

[1] "week 1st" "week 2nd" "week 3rd" "week 4th"

如何使用 R 从字符串中间删除两位数字（01、02 等）的前导零？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-01-13 08:26:41

解决方案2
0 2022-01-13 03:45:13

如何使用 R 从字符串中间删除两位数字（01、02 等）的前导零？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-01-13 08:26:41

解决方案2 0 2022-01-13 03:45:13

解决方案1
1 已采纳 2022-01-13 08:26:41

解决方案2
0 2022-01-13 03:45:13