使用单词列表的正则表达式

Question

I'm using Python. 我正在使用Python。

I have some strings : 我有一些琴弦：

'1 banana', '100 g of sugar', '1 cup of flour'

I need to distinguish the food from the quantity. 我需要从数量上区分食物。 I have an array of quantities type 我有一个数量类型的数组

quantities = ['g', 'cup', 'kg', 'L'] altern = '|'.join(quantities)

and so with using a regular expression I would like to get for example for '1 cup of flour' : 'flour' and '1 cup of' , for '1 banana' : '1' and 'banana' 因此，使用正则表达式，我想得到例如'1 cup of flour' ： 'flour'和'1 cup of' ，对于'1 banana' ： '1'和'banana'

I have written this regexp to match the quantity part of the strings above : 我写了这个正则表达式来匹配上面字符串的数量部分：

\d{1,3}\s<altern>?\s?(\bof\b)?

but I'm very unsure about this ...particularly on how to introduce the altern variable in the regular expression. 但是我对此不确定，尤其是如何在正则表达式中引入altern变量。

Answer 1

I think your amounts are units , so I took the liberty to fix this misnomer. 我认为您的amounts是units ，因此我可以自由解决此错误用语。 I propose to use named grouping to ease understanding the output. 我建议使用命名分组以简化对输出的理解。

import re

units = [ 'g', 'cup', 'kg', 'L' ]
anyUnitRE = '|'.join(units)

inputs = [ '1 banana', '100 g of sugar', '1 cup of flour' ]

for input in inputs:
  m = re.match(
    r'(?P<amount>\d{1,3})\s*'
    r'(?P<unit>(' + anyUnitRE + r')?)\s*'
    r'(?P<preposition>(of)?)\s*'
    r'(?P<name>.*)', input)
  print m and m.groupdict()

The output will be sth like this: 输出将如下所示：

{'preposition': '', 'amount': '1', 'name': 'banana', 'unit': ''}
{'preposition': 'of', 'amount': '100', 'name': 'sugar', 'unit': 'g'}
{'preposition': 'of', 'amount': '1', 'name': 'flour', 'unit': 'cup'}

So you can do sth like this: 因此，您可以这样做：

if m.groupdict()['name'] == 'sugar':
  …
amount = int(m.groupdict()['amount'])
unit = m.groupdict()['unit']

Answer 2

I think you can use this: 我认为您可以使用此：

"(.*?) (\w*)$"

And get \\1 for first part and \\2 for second part. 并在第一部分获得\\1 ，在第二部分获得\\1 \\2 。

[Regex Demo] [正则表达式演示]

And for a better regex: 为了更好的正则表达式：

"^((?=.*of)((.*of)(.*)))|((?!.*of)(\d+)(.*))$"

And get \\3 and \\6 for first part and \\4 and \\7 for second part. 并在第一部分得到\\3和\\6在第二部分得到\\4和\\7 。

Answer 3

You can try this code: 您可以尝试以下代码：

import re
lst = ['1 banana', '100 g of sugar', '1 cup of flour']
quantities = ['g', 'cup', 'kg', 'L']
altern = '|'.join(quantities)
r = r'(\d{1,3})\s*((?:%s)?s?(?:\s*\bof\b)?\s*\S+)'%(altern)
for x in lst:
    print re.findall(r, x)

See demo 观看演示

Output: 输出：

[('1', 'banana')]
[('100', 'g of sugar')]
[('1', 'cup of flour')]

Answer 4

Why do you want to do this with regular expressions? 为什么要使用正则表达式执行此操作？ You can use Python's string splitting functions instead: 您可以改用Python的字符串拆分功能：

def qsplit(a):
    """Return a tuple of quantity and ingredient"""

    if not a:
        return None

    if not a[0] in "0123456789":
        return ["0", a]

    if " of " in a:
        return a.split(" of ", 1)

    return a.split(None, 1)

使用单词列表的正则表达式

问题描述

4 个解决方案

解决方案1
4 已采纳 2015-08-03 13:45:39

解决方案2
2 2015-08-03 13:35:56

解决方案3
0 2015-08-03 13:36:37

解决方案4
0 2015-08-03 13:46:55

使用单词列表的正则表达式

问题描述

4 个解决方案

解决方案1 4 已采纳 2015-08-03 13:45:39

解决方案2 2 2015-08-03 13:35:56

解决方案3 0 2015-08-03 13:36:37

解决方案4 0 2015-08-03 13:46:55

解决方案1
4 已采纳 2015-08-03 13:45:39

解决方案2
2 2015-08-03 13:35:56

解决方案3
0 2015-08-03 13:36:37

解决方案4
0 2015-08-03 13:46:55