简体   繁体   中英

Regular expression not matching re.match

I'm trying to match a regular expression that parses a response .....

error code|error text|submission reference
2|missing or invalid fields|0

it uses re.match(self.error_format)

I have tried error_format as:

(?P<status_code>[0-9]+)|(?P<status_message>.+)|(?P<gateway_message_id>[a-zA-Z0-9-]+)

but this matches the line error code|error text|submission reference not the second line as needed.

also tried:

(?P<status_code>[0-9]+)\|(?P<status_message>.+)\|(?P<gateway_message_id>[a-zA-Z0-9-]+)

but this does not match at all.

Update:

What I want to do is match only 2|missing or invalid fields|0 but the full text is error code|error text|submission reference 2|missing or invalid fields|0 so its like I need to skip the first part.

ie

msg = re.match('(?P<status_code>[0-9]+)\|(?P<status_message>.+)\|(?P<gateway_message_id>[a-zA-Z0-9-]+)', 'error code|error text|submission reference 2|missing or invalid fields|0')

Try not to match the separator. Like this:

 (?P<status_code>^[0-9][^|]*)\|(?P<status_message>[^|]+)\|(?P<gateway_message_id>.+)
msg = re.match('(?P<status_code>[0-9]+)\|(?P<status_message>[^|]+)\|(?P<gateway_message_id>[a-zA-Z0-9-]+)', '2|missing or invalid fields|0')

matches perfectly, and then you can access the individual parts via msg.group('status_code')

The version without the \\ will also match, but it will only catch the "2" and won't fill all three groups in your 2nd line example.

If you want to run this on a text with multiple lines, you can do

matches = re.finditer('(?P<status_code>[0-9]+)\|(?P<status_message>[^|]+)\|(?P<gateway_message_id>[a-zA-Z0-9-]+)', s)
for m in matches:
    print m.group('status_code'), m.group('status_message'), m.group('gateway_message_id')

or then the other way around:

for line in youtext.split('\n'):
    m = re.match(msg = re.match('(?P<status_code>[0-9]+)\|(?P<status_message>[^|]+)\|(?P<gateway_message_id>[a-zA-Z0-9-]+)', line)
    if m:
        print m.group('status_code'), m.group('status_message'), m.group('gateway_message_id')

I think that covers all the options and none of them will match your first line that doesn't have a numeric error code in the first section.

import re

tests = '''\
error code|error text|submission reference
2|missing or invalid fields|0'''.splitlines()

for test in tests:
    pat = r'''(?x)
        (?P<status_code>[^|]+)
        [|](?P<status_message>.+)
        [|](?P<gateway_message_id>[\w\d-]+)'''

    print(re.match(pat, test).groups())

yields

('error code', 'error text', 'submission')
('2', 'missing or invalid fields', '0')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM