简体   繁体   中英

Python 3.6 Regex Producing Unexpected Results (despite using string literals)

A short time ago I had an almost identical problem to this one and it was fixed by using string literals instead of literal strings . This time, I took care to use string literals but it didn't fix the problem.

I am trying to extract a section from a string and the results I get from Python are different than what regex101 shows I should be getting. I'm using this

Supersedes:?[\\r\\n ]+(?:[A-Za-z\-0-9])*[\\w\-\\s]+[\\r\n ]+(.*)[\\r\\n ]+Serial Numbers:?

to match this text:

\\r\\n\\r\\nSupersedes\\r\\nNone\\r\\n\\r\\nChanges to VGA-77 gas module assembly (0110444290)\\r\\n\\r\\nService Serial Numbers:\\r\\nUS00000000-US99999999\\r\\n\\r

I'm expecting the first captured group to give me

n\r\nChanges to VGA-77 gas module assembly (0110444290)\r\n\r\nService

https://regex101.com/r/eHdhBV/2

But when I try this in Python:

rx = r'Supersedes:?[\r\n ]+(?:[A-Za-z\-0-9])*[\w\-\s]+[\r\n ]+(.*)[\r\n ]+Serial Numbers:?'
string = '\r\n\r\nSupersedes\r\nNone\r\n\r\nChanges to VGA-77 gas module assembly (0110444290)\r\n\r\nService Serial Numbers:\r\nUS00000000-US99999999\r\n\r'
result = re.search(rx, string, re.M|re.S)
result[1]
'(0110444290)\r\n\r\nService'

The result is not the same as what is shown on regex101. What's causing this?

To solve the current issue, you may use

m = re.search(r'Supersedes:?\s*[^\r\n]*[\r\n]+(.*?)[ \r\n]+Serial Numbers', s, re.S)
if m:
    print(m.group())

See the regex demo online .

Please note that you should use literal strings in online regex testers, that is, convert your \\n and \\r into line breaks.

Pattern details

  • Supersedes:? - Supersedes: or Supersedes
  • \\s* - any 0+ whitespaces
  • [^\\r\\n]* - any 0+ chars other than LF an CR
  • [\\r\\n]+ - 1+ LR or CR symbols
  • (.*?) - Group 1: any 0+ chars, as few as possible
  • [ \\r\\n]+ - 1+ spaces, CR or LF
  • Serial Numbers - a literal Serial Numbers string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM