I am trying to clean some data including texts like '6cm*8cm', '6cmx8cm', and '6*8'. I want to modify them so that they become similar. Notice that numbers are changeable, so the data may have '3cm*4cm' etc.
# input strings
strings = [
"6cm*8cm",
"12mmx15mm",
'Device stemmer 2mm*8mm',
'Device stemming 2mmx8mm'
]
# My desired output would be:
desired_strings = [
'6*8',
'12*15',
'Device stemmer 2*8',
'Device stemming 2*8'
]
I am using python's 're'. My preference is to convert them to a simple '6*8' (ie, number*number). Note that in some of the entries data has strings like: 'Device stemmer 2mm*8mm', and I do not want to change other words.
Is there a pythonic way with regex to modify all the possible combinations of numbers and units paired with each other?
I used:
import re
strings = [
"6cm*8cm",
"12mmx15mm",
'Device stemmer 2mm*8mm',
'Device stemming 2mmx8mm'
]
for i in strings:
result = re.sub(r"([0-9]+)(cm|mm)(\*|x)([0-9]+)(cm|mm)", r"\1*\4", i)
print(result)
Notes:
([0-9]+)
: matches numbers,
(cm|mm)
: matches units and |
stands for logical OR
,
(\*|x)
: matches \*
or x
as the separator of pairs,
\1
: gives the first group (here the first number eg, 6),
\4
: gives the fourth group (here the second number eg, 8)
https://regex101.com/ and this answer helped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.