简体   繁体   中英

Regex check if a string has any numbers followed by units in python and modify it

I am trying to clean some data including texts like '6cm*8cm', '6cmx8cm', and '6*8'. I want to modify them so that they become similar. Notice that numbers are changeable, so the data may have '3cm*4cm' etc.

# input strings
strings = [
    "6cm*8cm",
    "12mmx15mm",
    'Device stemmer 2mm*8mm',
    'Device stemming 2mmx8mm'
]
# My desired output would be:
desired_strings = [
    '6*8',
    '12*15',
    'Device stemmer 2*8',
    'Device stemming 2*8'
]

I am using python's 're'. My preference is to convert them to a simple '6*8' (ie, number*number). Note that in some of the entries data has strings like: 'Device stemmer 2mm*8mm', and I do not want to change other words.

Is there a pythonic way with regex to modify all the possible combinations of numbers and units paired with each other?

I used:

import re

strings = [
    "6cm*8cm",
    "12mmx15mm",
    'Device stemmer 2mm*8mm',
    'Device stemming 2mmx8mm'
]

for i in strings:
    result = re.sub(r"([0-9]+)(cm|mm)(\*|x)([0-9]+)(cm|mm)", r"\1*\4", i)
    print(result)

Notes:
([0-9]+) : matches numbers,
(cm|mm) : matches units and | stands for logical OR ,
(\*|x) : matches \* or x as the separator of pairs,
\1 : gives the first group (here the first number eg, 6),
\4 : gives the fourth group (here the second number eg, 8)

https://regex101.com/ and this answer helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM