简体   繁体   中英

python regex for repeating string

I am wanting to verify and then parse this string (in quotes):

string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'

I would like to verify that the string starts with 'start:' and ends with ';' Afterward, I would like to have a regex parse out the strings. I tried the following python re code:

regx = r"start: (c?[0-9]+,?)+;" 
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()

I have tried different variations but I can either get the first or the last code but not a list of all three.

Or should I abandon using a regex?

EDIT: updated to reflect part of the problem space I neglected and fixed string difference. Thanks for all the suggestions - in such a short time.

In Python, this isn't possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text) .

您可以使用标准的字符串工具,它们几乎总是更具可读性。

s = "start: c12354, c3456, 34526;"

s.startswith("start:") # returns a boolean if it starts with this string

s.endswith(";") # returns a boolean if it ends with this string

s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "

This can be done (pretty elegantly) with a tool like Pyparsing :

from pyparsing import Group, Literal, Optional, Word
import string

code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
    for line in f:
        try:
            result = parser.parseString(line)
            codes = [c[1] for c in result[1:-1]]
            # Do something with teh codez...
        except ParseException exc:
            # Oh noes: string doesn't match!
            continue

Cleaner than a regular expression, returns a list of codes (no need to string.split ), and ignores any extra characters in the line, just like your example.

import re

sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')

mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
    res = re.findall(slst, match.group(0))

results in

['12354', '3456', '34526']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM