简体   繁体   中英

Remove leading zeros in middle of string with regex

I have a large number of strings on the format YYYYYYYYXXXXXXXXZZZZZZZZ, where X, Y, and Z are numbers of fix length, eight digits. Now, the problem is that I need to parse out the middle sequence of integers and remove any leading zeroes. Unfortunately is the only way to determine where each of the three sequences begins/ends is to count the number of digits.

I am currently doing it in two steps, ie:

m = re.match(
    r"(?P<first_sequence>\d{8})"
    r"(?P<second_sequence>\d{8})"
    r"(?P<third_sequence>\d{8})",
    string)
second_secquence = m.group(2)
second_secquence.lstrip(0)

Which does work, and gives me the right results, eg:

112233441234567855667788 --> 12345678
112233440012345655667788 --> 123456
112233001234567855667788 --> 12345678
112233000012345655667788 --> 123456

But is there a better method? Is is possible to write a single regex expression which matches against the second sequence, sans the leading zeros?

I guess I am looking for a regex which does the following:

  1. Skips over the first eight digits.
  2. Skips any leading zeros.
  3. Captures anything after that, up to the point where there's sixteen characters behind/eight infront.

The above solution does work, as mentioned, so the purpose of this problem is more to improve my knowledge of regex. I appreciate any pointers.

This is a typical case of "useless use of regular expressions".

Your strings are fixed-length. Just cut them at the appropriate positions.

s = "112233440012345655667788"
int(s[8:16])
# -> 123456

我认为不使用正则表达式更简单。

result = my_str[8:16].lstrip('0')

Agree with the other answers here that regex isn't really required. If you really want to use regex, then \\d{8}0*(\\d*)\\d{8} should do it.

Just to show that it is possible with regex:

https://regex101.com/r/8RSxaH/2

# CODE AUTO GENERATED BY REGEX101.COM (SEE LINK ABOVE)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\d{8})((?:0*)(\d{,8}))(?=\d{8})"

test_str = ("112233441234567855667788\n"
    "112233440012345655667788\n"
    "112233001234567855667788\n"
    "112233000012345655667788")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Although you don't really need it to do what you're asking

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM