简体   繁体   中英

Python Regex: Extract all occurences of a substring within a string

I am trying to extract all occurrences of a substring within a string using Python Regex. This is what I have tried:

import re
line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
m = re.findall(r'\d+x.*?[a-zA-Z]', line)
print (m)

The output I am getting is ['10x35c', '30x35c']

The output I am trying to achieve is ['10'x20'', '10x35cm', '30x35cm']

You can do this without regex using split :

In [1089]: m = [i.split(':')[1].strip() for i in line.split(',')]

In [1090]: m
Out[1090]: ["10'x20'", '10x35cm', '30x35cm']

You may use this regex:

r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?"

RegEx Demo

Code:

>>> import re
>>> line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
>>> print (re.findall(r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?", line))
["10'x20'", '10x35cm', '30x35cm']

RegEx Details:

  • \d+ : Match 1+ digits
  • ['\"]? : Match optional ' or "
  • x : Match letter x
  • \d+ : Match 1+ digits
  • ['\"]? : Match optional ' or "
  • (?:\s*[a-zA-Z]+)? : Match optional units comprising 1+ letters

Use

import re
string = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
print(re.findall(r"""\d+'?x\d+'?(?: *[a-z]+)?""", string, re.I))

Results : ["10'x20'", '10x35cm', '30x35cm']

See Python proof . re.I stands for case insensitive matching.

Explanation :

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  '?                       '\'' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  x                        'x'
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  '?                       '\'' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
     *                       ' ' (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount possible))
--------------------------------------------------------------------------------
  )?                       end of grouping

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM