I am trying to extract all occurrences of a substring within a string using Python Regex. This is what I have tried:
import re
line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
m = re.findall(r'\d+x.*?[a-zA-Z]', line)
print (m)
The output I am getting is ['10x35c', '30x35c']
The output I am trying to achieve is ['10'x20'', '10x35cm', '30x35cm']
You can do this without regex
using split
:
In [1089]: m = [i.split(':')[1].strip() for i in line.split(',')]
In [1090]: m
Out[1090]: ["10'x20'", '10x35cm', '30x35cm']
You may use this regex:
r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?"
Code:
>>> import re
>>> line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
>>> print (re.findall(r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?", line))
["10'x20'", '10x35cm', '30x35cm']
RegEx Details:
\d+
: Match 1+ digits ['\"]?
: Match optional '
or "
x
: Match letter x
\d+
: Match 1+ digits ['\"]?
: Match optional '
or "
(?:\s*[a-zA-Z]+)?
: Match optional units comprising 1+ letters Use
import re
string = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
print(re.findall(r"""\d+'?x\d+'?(?: *[a-z]+)?""", string, re.I))
Results : ["10'x20'", '10x35cm', '30x35cm']
See Python proof . re.I
stands for case insensitive matching.
Explanation :
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
'? '\'' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
x 'x'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
'? '\'' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.