简体   繁体   中英

Regular expression to extract fractions

I am looking for a regex to match the fractions of format [0-9]\/[1-9]{1,2} in a given string.

Below is an example:

my_str = "This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour." # A free text

def replace_fractions(text):
    fraction_dict = {
        '1/2': 'half',
        '1/4': 'quarter',
        '3/4': 'three quarters',
        '2/3': 'two thirds',
    }
    _tmp = ' '.join([fraction_dict.get(w, w).strip() for w in text.split()])
    return _tmp

current_result = replace_fractions("This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour.")

current_result:

"This is a half 1/4. Press 1/2/3. He drove a car for 1/2hour."

expected_result:

"This is a half quarter. Press 1/2/3. He drove a car for half hour."

It is clear that regex needs to be used to handle cases like 1/2/3 or 1/4. or 1/2hour .

But, this [0-9]\/[1-9]{1,2} matches everything. What will be the right regex to handle these cases.

Note : The regex need to only handle the above cases. All extreme cases can be ignored (or will be re-edited after expert comments)

You may use the following return in your method:

return re.sub(r'(?<!\d)(?<!\d/)[0-9]/[0-9]{1,2}(?!/?\d)', lambda x: fraction_dict.get(x.group(), x.group()), text)

See the Python demo . Note the space between half and hour is missing, because it was missing in the input. You would need to add more logic to add the space only in expected cases.

Details

  • (?<?\d)(?<!\d/) - a digit or a digit and / are not allowed immediately on the left
  • [0-9]/[0-9]{1,2} - a digit, / and 1 or 2 digits
  • (??/?\d) - immediately to the right, there should not be / + digit or just a digit.

Full code snippet :

import re
my_str = "This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour." # A free text

def replace_fractions(text):
    fraction_dict = {
        '1/2': 'half',
        '1/4': 'quarter',
        '3/4': 'three quarters',
        '2/3': 'two thirds',
    }
    return re.sub(r'(?<!\d)(?<!\d/)[0-9]/[0-9]{1,2}(?!/?\d)', lambda x: fraction_dict.get(x.group(), x.group()), text)

current_result = replace_fractions("This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour.")
print(current_result)
# => This is a half quarter. Press 1/2/3. He drove a car for halfhour.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM