[英]A regular expression using a list of words
I'm using Python. 我正在使用Python。
I have some strings : 我有一些琴弦:
'1 banana', '100 g of sugar', '1 cup of flour'
I need to distinguish the food from the quantity. 我需要从数量上区分食物。 I have an array of quantities type
我有一个数量类型的数组
quantities = ['g', 'cup', 'kg', 'L'] altern = '|'.join(quantities)
and so with using a regular expression I would like to get for example for '1 cup of flour'
: 'flour'
and '1 cup of'
, for '1 banana'
: '1'
and 'banana'
因此,使用正则表达式,我想得到例如
'1 cup of flour'
: 'flour'
和'1 cup of'
,对于'1 banana'
: '1'
和'banana'
I have written this regexp to match the quantity part of the strings above : 我写了这个正则表达式来匹配上面字符串的数量部分:
\d{1,3}\s<altern>?\s?(\bof\b)?
but I'm very unsure about this ...particularly on how to introduce the altern variable in the regular expression. 但是我对此不确定,尤其是如何在正则表达式中引入altern变量。
I think your amounts
are units
, so I took the liberty to fix this misnomer. 我认为您的
amounts
是units
,因此我可以自由解决此错误用语。 I propose to use named grouping to ease understanding the output. 我建议使用命名分组以简化对输出的理解。
import re
units = [ 'g', 'cup', 'kg', 'L' ]
anyUnitRE = '|'.join(units)
inputs = [ '1 banana', '100 g of sugar', '1 cup of flour' ]
for input in inputs:
m = re.match(
r'(?P<amount>\d{1,3})\s*'
r'(?P<unit>(' + anyUnitRE + r')?)\s*'
r'(?P<preposition>(of)?)\s*'
r'(?P<name>.*)', input)
print m and m.groupdict()
The output will be sth like this: 输出将如下所示:
{'preposition': '', 'amount': '1', 'name': 'banana', 'unit': ''}
{'preposition': 'of', 'amount': '100', 'name': 'sugar', 'unit': 'g'}
{'preposition': 'of', 'amount': '1', 'name': 'flour', 'unit': 'cup'}
So you can do sth like this: 因此,您可以这样做:
if m.groupdict()['name'] == 'sugar':
…
amount = int(m.groupdict()['amount'])
unit = m.groupdict()['unit']
I think you can use this: 我认为您可以使用此:
"(.*?) (\w*)$"
And get \\1
for first part and \\2
for second part. 并在第一部分获得
\\1
,在第二部分获得\\1
\\2
。
And for a better regex: 为了更好的正则表达式:
"^((?=.*of)((.*of)(.*)))|((?!.*of)(\d+)(.*))$"
And get \\3
and \\6
for first part and \\4
and \\7
for second part. 并在第一部分得到
\\3
和\\6
在第二部分得到\\4
和\\7
。
You can try this code: 您可以尝试以下代码:
import re
lst = ['1 banana', '100 g of sugar', '1 cup of flour']
quantities = ['g', 'cup', 'kg', 'L']
altern = '|'.join(quantities)
r = r'(\d{1,3})\s*((?:%s)?s?(?:\s*\bof\b)?\s*\S+)'%(altern)
for x in lst:
print re.findall(r, x)
Output: 输出:
[('1', 'banana')]
[('100', 'g of sugar')]
[('1', 'cup of flour')]
Why do you want to do this with regular expressions? 为什么要使用正则表达式执行此操作? You can use Python's string splitting functions instead:
您可以改用Python的字符串拆分功能:
def qsplit(a):
"""Return a tuple of quantity and ingredient"""
if not a:
return None
if not a[0] in "0123456789":
return ["0", a]
if " of " in a:
return a.split(" of ", 1)
return a.split(None, 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.