简体   繁体   中英

Ignore white spaces in a regular expression

I would like to ignore white spaces and parse a pattern like (int, int) xx (int, int) . For exemple,

import re
m = re.match(r"[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*xx[\s]*\([\s]*(\d+)[\s]*,[\s]*(\d+)[\s]*\)[\s]*", "   (2,  74) xx   (5  ,6), physicist")
print (m.group(0)) #    (2,  74) xx   (5  ,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

As you can see, in my pattern there are lots of [\\s]* to represent zero or more white spaces. Is there a simpler way to write this pattern?

I don't know of a method baked into regex, but the easiest solution that comes to mind is using a simple string replace:

import re
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", "   (2,  74) xx   (5  ,6), physicist".replace(' ', ''))
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

You could also use regex to remove any kind of whitespace (not just spaces):

import re
s = re.sub(r'\s+', '', '   (2,  74) xx   (5  ,6), physicist')
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", s)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

Straight forward answer is NO . Even they are only white spaces but the fact is they all are characters, thus, they are parts of pattern. I think there are some ways here

  1. Preprocess your string by removing unwanted white spaces.
  2. Find the another way to express your pattern.
  3. Use alternative methods for matching.

eg

>> re.findall(r'\d+', "   (2,  74) xx   (5  ,6), physicist")
['2', '74', '5', '6']

If you want to simplify your specific pattern you could eliminate all whitespaces in one separate step before, since they are not relevant for your pattern.

Example:

import re
input = '   (2,  74) xx   (5  ,6), physicist'
m = re.match(r"\((\d+),(\d+)\)xx\((\d+),(\d+)\)", input.replace(' ', '')

I think all you want is to get all the 4 integers, so you can delete all white spaces and then match

import re
a = '(  2 , 74 ) xx (5       , 6 )'
b = re.sub(r'\s+','',a)
m = re.match(r'\((\d+),(\d+)\)xx\((\d+),(\d+)\)',b)
print (m.group(0)) # (2,74)xx(5,6)
print (m.group(1)) # 2
print (m.group(2)) # 74
print (m.group(3)) # 5
print (m.group(4)) # 6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM