简体   繁体   English

使用正则表达式删除字符串中间的前导零

[英]Remove leading zeros in middle of string with regex

I have a large number of strings on the format YYYYYYYYXXXXXXXXZZZZZZZZ, where X, Y, and Z are numbers of fix length, eight digits. 我在YYYYYYYYXXXXXXXXZZZZZZZZ格式上有大量的字符串,其中X,Y和Z是固定长度的数字,八位数。 Now, the problem is that I need to parse out the middle sequence of integers and remove any leading zeroes. 现在,问题是我需要解析整数的中间序列并删除任何前导零。 Unfortunately is the only way to determine where each of the three sequences begins/ends is to count the number of digits. 不幸的是,确定三个序列中每个序列的开始/结束位置的唯一方法是计算位数。

I am currently doing it in two steps, ie: 我目前分两步进行,即:

m = re.match(
    r"(?P<first_sequence>\d{8})"
    r"(?P<second_sequence>\d{8})"
    r"(?P<third_sequence>\d{8})",
    string)
second_secquence = m.group(2)
second_secquence.lstrip(0)

Which does work, and gives me the right results, eg: 哪个确实有效,并给我正确的结果,例如:

112233441234567855667788 --> 12345678
112233440012345655667788 --> 123456
112233001234567855667788 --> 12345678
112233000012345655667788 --> 123456

But is there a better method? 但是有更好的方法吗? Is is possible to write a single regex expression which matches against the second sequence, sans the leading zeros? 有可能写出一个与第二个序列匹配的正则表达式,没有前导零吗?

I guess I am looking for a regex which does the following: 我想我正在寻找一个正则表达式,它执行以下操作:

  1. Skips over the first eight digits. 跳过前八位数字。
  2. Skips any leading zeros. 跳过任何前导零。
  3. Captures anything after that, up to the point where there's sixteen characters behind/eight infront. 在那之后捕获任何东西,直到后面有16个字符/ 8个字符。

The above solution does work, as mentioned, so the purpose of this problem is more to improve my knowledge of regex. 如上所述,上述解决方案确实有效,因此这个问题的目的更多是为了提高我对正则表达式的了解。 I appreciate any pointers. 我很欣赏任何指针。

This is a typical case of "useless use of regular expressions". 这是“无用的正则表达式”的典型案例。

Your strings are fixed-length. 你的字符串是固定长度的。 Just cut them at the appropriate positions. 只需将它们切割到适当的位置即可。

s = "112233440012345655667788"
int(s[8:16])
# -> 123456

我认为不使用正则表达式更简单。

result = my_str[8:16].lstrip('0')

Agree with the other answers here that regex isn't really required. 同意这里的其他答案,并不真正需要正则表达式。 If you really want to use regex, then \\d{8}0*(\\d*)\\d{8} should do it. 如果你真的想使用正则表达式,那么\\d{8}0*(\\d*)\\d{8}应该这样做。

Just to show that it is possible with regex: 只是为了证明正则表达式是可能的

https://regex101.com/r/8RSxaH/2 https://regex101.com/r/8RSxaH/2

# CODE AUTO GENERATED BY REGEX101.COM (SEE LINK ABOVE)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\d{8})((?:0*)(\d{,8}))(?=\d{8})"

test_str = ("112233441234567855667788\n"
    "112233440012345655667788\n"
    "112233001234567855667788\n"
    "112233000012345655667788")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Although you don't really need it to do what you're asking 虽然你真的不需要它来做你所要求的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM