A short time ago I had an almost identical problem to this one and it was fixed by using string literals instead of literal strings . This time, I took care to use string literals but it didn't fix the problem.
I am trying to extract a section from a string and the results I get from Python are different than what regex101 shows I should be getting. I'm using this
Supersedes:?[\\r\\n ]+(?:[A-Za-z\-0-9])*[\\w\-\\s]+[\\r\n ]+(.*)[\\r\\n ]+Serial Numbers:?
to match this text:
\\r\\n\\r\\nSupersedes\\r\\nNone\\r\\n\\r\\nChanges to VGA-77 gas module assembly (0110444290)\\r\\n\\r\\nService Serial Numbers:\\r\\nUS00000000-US99999999\\r\\n\\r
I'm expecting the first captured group to give me
n\r\nChanges to VGA-77 gas module assembly (0110444290)\r\n\r\nService
https://regex101.com/r/eHdhBV/2
But when I try this in Python:
rx = r'Supersedes:?[\r\n ]+(?:[A-Za-z\-0-9])*[\w\-\s]+[\r\n ]+(.*)[\r\n ]+Serial Numbers:?'
string = '\r\n\r\nSupersedes\r\nNone\r\n\r\nChanges to VGA-77 gas module assembly (0110444290)\r\n\r\nService Serial Numbers:\r\nUS00000000-US99999999\r\n\r'
result = re.search(rx, string, re.M|re.S)
result[1]
'(0110444290)\r\n\r\nService'
The result is not the same as what is shown on regex101. What's causing this?
To solve the current issue, you may use
m = re.search(r'Supersedes:?\s*[^\r\n]*[\r\n]+(.*?)[ \r\n]+Serial Numbers', s, re.S)
if m:
print(m.group())
See the regex demo online .
Please note that you should use literal strings in online regex testers, that is, convert your \\n
and \\r
into line breaks.
Pattern details
Supersedes:?
- Supersedes:
or Supersedes
\\s*
- any 0+ whitespaces [^\\r\\n]*
- any 0+ chars other than LF an CR [\\r\\n]+
- 1+ LR or CR symbols (.*?)
- Group 1: any 0+ chars, as few as possible [ \\r\\n]+
- 1+ spaces, CR or LF Serial Numbers
- a literal Serial Numbers
string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.