简体   繁体   中英

Regex to remove leading zeros in R, unless the final (or only) character is zero

gsub("(?<![0-9])0+", "", c("005", "0AB", "000", "0"), perl = TRUE)
#> [1] "5"  "AB" ""   ""
gsub("(^|[^0-9])0+", "\\1", c("005", "0AB", "000", "0"), perl = TRUE)
#> [1] "5"  "AB" ""   ""

The regular expression above is from this SO thread explaining how to remove all leading zeros from a string in R. As a consequence of this regular expression both "000" and "0" are transformed into "". Instead I want to remove all leading zeros from a string of characters, except for the cases when the final character happens to be zero, or the only character is zero.

"005" would become "5"
"0AB" would become "AB"
"000" would become "0"
"0"   would become "0"

This other SO thread explains how to do what I want, but I don't think I'm getting the syntax quite correct, applying the solution in R. And I don't really understand the distinction between the 1st and 2nd solution below (if they indeed worked).

gsub("s/^0*(\d+)$/$1/;", "", c("005", "0AB", "000", "0"), perl = TRUE)  # 1st solution
# Error: '\d' is an unrecognized escape in character string starting ""s/^0*(\d"
gsub("s/0*(\d+)/$1/;", "", c("005", "0AB", "000", "0"), perl = TRUE)    # 2nd solution
# Error: '\d' is an unrecognized escape in character string starting ""s/0*(\d"

What is the proper regex in R to get what I want?

You may remove all zeros from the start of a string but not the last one:

sub("^0+(?!$)", "", x, perl=TRUE)

See the regex demo .

Details

  • ^ - start of a string
  • 0+ - one or more zeros
  • (?!$) - a negative lookahead that fails the match if there is an end of string position immediately to the right of the current location

See the R demo :

x <- c("005", "0AB", "000", "0")
sub("^0+(?!$)", "", x, perl=TRUE)
## => [1] "5"  "AB" "0"  "0"

We can add one more condition with a regex lookaround to check for any non-zero values after one or more zeros ( 0+ )

sub("(?<![0-9])0+(?=[^0])", "", sub("^0+$", "0", v1), perl = TRUE)
#[1] "5"  "AB" "0"  "0" 

data

v1 <- c("005", "0AB", "000", "0")

By using a non word boundary \\B . See this demo at regex101 or an R demo at tio.run .

sub("^0+\\B", "", s)

This will not match the last zero, because right of it there is no word character .

You could use an alternation to either match all the zeroes in the string in a capturing group or match all the zeroes from the start of the string.

In the replacement use group 1.

^0*(0)$|^0+

Regex demo | R demo

For example

sub("^0*(0)$|^0+", "\\1", c("005", "0AB", "000", "0"))

Output

[1] "5"  "AB" "0"  "0"

Or even better as commented by Wiktor Stribiżew , you could use capture a single 0 in a group and repeat the group itself to capture the last instance of a zero.

^(0)+$|^0+

Regex demo

Another regex option:

^0*(.+)$

Here's a regex demo .

Using base::sub in R:

sub("^0*(.+)$", "\\1", c("005", "0AB", "000", "0"))  

 ## [1] "5"  "AB" "0"  "0" 

Here's an R demo .

Or expanding on @akrun's answer :

sub("^$", "0", sub("^0+", "", c("005", "0AB", "000", "0")), perl = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM