简体   繁体   English

使用单词列表的正则表达式

[英]A regular expression using a list of words

I'm using Python. 我正在使用Python。

I have some strings : 我有一些琴弦:

'1 banana', '100 g of sugar', '1 cup of flour'

I need to distinguish the food from the quantity. 我需要从数量上区分食物。 I have an array of quantities type 我有一个数量类型的数组

quantities = ['g', 'cup', 'kg', 'L'] altern = '|'.join(quantities)

and so with using a regular expression I would like to get for example for '1 cup of flour' : 'flour' and '1 cup of' , for '1 banana' : '1' and 'banana' 因此,使用正则表达式,我想得到例如'1 cup of flour''flour''1 cup of' ,对于'1 banana''1''banana'

I have written this regexp to match the quantity part of the strings above : 我写了这个正则表达式来匹配上面字符串的数量部分:

\d{1,3}\s<altern>?\s?(\bof\b)?

but I'm very unsure about this ...particularly on how to introduce the altern variable in the regular expression. 但是我对此不确定,尤其是如何在正则表达式中引入altern变量。

I think your amounts are units , so I took the liberty to fix this misnomer. 我认为您的amountsunits ,因此我可以自由解决此错误用语。 I propose to use named grouping to ease understanding the output. 我建议使用命名分组以简化对输出的理解。

import re

units = [ 'g', 'cup', 'kg', 'L' ]
anyUnitRE = '|'.join(units)

inputs = [ '1 banana', '100 g of sugar', '1 cup of flour' ]

for input in inputs:
  m = re.match(
    r'(?P<amount>\d{1,3})\s*'
    r'(?P<unit>(' + anyUnitRE + r')?)\s*'
    r'(?P<preposition>(of)?)\s*'
    r'(?P<name>.*)', input)
  print m and m.groupdict()

The output will be sth like this: 输出将如下所示:

{'preposition': '', 'amount': '1', 'name': 'banana', 'unit': ''}
{'preposition': 'of', 'amount': '100', 'name': 'sugar', 'unit': 'g'}
{'preposition': 'of', 'amount': '1', 'name': 'flour', 'unit': 'cup'}

So you can do sth like this: 因此,您可以这样做:

if m.groupdict()['name'] == 'sugar':
  …
amount = int(m.groupdict()['amount'])
unit = m.groupdict()['unit']

I think you can use this: 我认为您可以使用此:

"(.*?) (\w*)$"

And get \\1 for first part and \\2 for second part. 并在第一部分获得\\1 ,在第二部分获得\\1 \\2

[Regex Demo] [正则表达式演示]

And for a better regex: 为了更好的正则表达式:

"^((?=.*of)((.*of)(.*)))|((?!.*of)(\d+)(.*))$"

And get \\3 and \\6 for first part and \\4 and \\7 for second part. 并在第一部分得到\\3\\6在第二部分得到\\4\\7

You can try this code: 您可以尝试以下代码:

import re
lst = ['1 banana', '100 g of sugar', '1 cup of flour']
quantities = ['g', 'cup', 'kg', 'L']
altern = '|'.join(quantities)
r = r'(\d{1,3})\s*((?:%s)?s?(?:\s*\bof\b)?\s*\S+)'%(altern)
for x in lst:
    print re.findall(r, x)

See demo 观看演示

Output: 输出:

[('1', 'banana')]
[('100', 'g of sugar')]
[('1', 'cup of flour')]

Why do you want to do this with regular expressions? 为什么要使用正则表达式执行此操作? You can use Python's string splitting functions instead: 您可以改用Python的字符串拆分功能:

def qsplit(a):
    """Return a tuple of quantity and ingredient"""

    if not a:
        return None

    if not a[0] in "0123456789":
        return ["0", a]

    if " of " in a:
        return a.split(" of ", 1)

    return a.split(None, 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM