I have a pattern as below:
measurement = re.compile("(\d+(?:\.\d*)?)\s*x\s*(\d+(?:\.\d*)?)\s*(cm|mm|millimeter|centimeter|millimeters|centimeters)")
It can be seen in several times in a sentence and in a document. I want to find all matches and replace it with "MEASUREMENT", also I want to add its value in a list.
**Input_Text**: measuring 9 x 5 mm and previously measuring 8 x 6 mm
**Output**: measuring MEASUREMENT and previously measuring MEASUREMENT
**List**: 9 x 5 mm, 8 x 6 mm
So far my code is below but it only brings the first match:
result = re.search(measurement, Input_Text)
if result:
Input_Text = Input_Text.replace(result, "MEASUREMENT")
You can use re.sub()
for the replacement, and re.findall()
to get all matched strings.
measurement = re.compile("(\d+(?:\.\d*)?)\s*x\s*(\d+(?:\.\d*)?)\s*(cm|mm|millimeter|centimeter|millimeters|centimeters)")
text = "measuring 9 x 5 mm and previously measuring 8 x 6 mm"
values = re.findall(pattern=measurement, string=text)
sub_text = re.sub(pattern=measurement, string=text, repl='MEASUREMENT')
>>> sub_text
'measuring MEASUREMENT and previously measuring MEASUREMENT'
>>> values
[('9', '5', 'mm'), ('8', '6', 'mm')]
If you don't want to parse your string twice, you can use re.sub
with a function as replacement parameter. With this function you can easily populate a list of matching strings.
pat = re.compile(r'\d+(?:\.\d*)?\s*x\s*\d+(?:\.\d*)?\s*(?:cm|mm|millimeters?|centimeters?)')
s = r'measuring 9 x 5 mm and previously measuring 8 x 6 mm'
l = []
def repl(m):
l.append(m.group(0))
return 'MEASUREMENT'
s = pat.sub(repl, s)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.