简体   繁体   English

提取分数的正则表达式

[英]Regular expression to extract fractions

I am looking for a regex to match the fractions of format [0-9]\/[1-9]{1,2} in a given string.我正在寻找一个正则表达式来匹配给定字符串中格式[0-9]\/[1-9]{1,2}的分数。

Below is an example:下面是一个例子:

my_str = "This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour." # A free text

def replace_fractions(text):
    fraction_dict = {
        '1/2': 'half',
        '1/4': 'quarter',
        '3/4': 'three quarters',
        '2/3': 'two thirds',
    }
    _tmp = ' '.join([fraction_dict.get(w, w).strip() for w in text.split()])
    return _tmp

current_result = replace_fractions("This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour.")

current_result:当前结果:

"This is a half 1/4. Press 1/2/3. He drove a car for 1/2hour."

expected_result:预期结果:

"This is a half quarter. Press 1/2/3. He drove a car for half hour."

It is clear that regex needs to be used to handle cases like 1/2/3 or 1/4.很明显,需要使用正则表达式来处理像1/2/31/4. or 1/2hour .1/2hour

But, this [0-9]\/[1-9]{1,2} matches everything.但是,这个[0-9]\/[1-9]{1,2}匹配所有内容。 What will be the right regex to handle these cases.处理这些情况的正确正则表达式是什么。

Note : The regex need to only handle the above cases.注意:正则表达式只需要处理上述情况。 All extreme cases can be ignored (or will be re-edited after expert comments)所有极端情况均可忽略(或专家点评后重新编辑)

You may use the following return in your method:您可以在您的方法中使用以下return值:

return re.sub(r'(?<!\d)(?<!\d/)[0-9]/[0-9]{1,2}(?!/?\d)', lambda x: fraction_dict.get(x.group(), x.group()), text)

See the Python demo .请参阅Python 演示 Note the space between half and hour is missing, because it was missing in the input.注意halfhour之间的空格丢失了,因为它在输入中丢失了。 You would need to add more logic to add the space only in expected cases.您需要添加更多逻辑以仅在预期情况下添加空间。

Details细节

  • (?<?\d)(?<!\d/) - a digit or a digit and / are not allowed immediately on the left (?<?\d)(?<!\d/) - 一个数字或一个数字和/不允许紧跟在左边
  • [0-9]/[0-9]{1,2} - a digit, / and 1 or 2 digits [0-9]/[0-9]{1,2} - 一个数字, /和 1 或 2 位数字
  • (??/?\d) - immediately to the right, there should not be / + digit or just a digit. (??/?\d) - 紧靠右边,不应该有/ + 数字或只有一个数字。

Full code snippet :完整的代码片段

import re
my_str = "This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour." # A free text

def replace_fractions(text):
    fraction_dict = {
        '1/2': 'half',
        '1/4': 'quarter',
        '3/4': 'three quarters',
        '2/3': 'two thirds',
    }
    return re.sub(r'(?<!\d)(?<!\d/)[0-9]/[0-9]{1,2}(?!/?\d)', lambda x: fraction_dict.get(x.group(), x.group()), text)

current_result = replace_fractions("This is a 1/2 1/4. Press 1/2/3. He drove a car for 1/2hour.")
print(current_result)
# => This is a half quarter. Press 1/2/3. He drove a car for halfhour.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM