[英]Regex in Python 3: match everything after a number or optional period but before an optional comma
我试图从食谱中返回成分,没有任何测量或指示。 成分是列表,如下所示:
['1 medium tomato, cut into 8 wedges',
'4 c. torn mixed salad greens',
'1/2 small red onion, sliced and separated into rings',
'1/4 small cucumber, sliced',
'1/4 c. sliced pitted ripe olives',
'2 Tbsp. reduced-calorie Italian salad dressing',
'2 Tbsp. lemon juice',
'1 Tbsp. water',
'1/2 tsp. dried mint, crushed',
'1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']
我想返回以下列表:
['medium tomato',
'torn mixed salad greens',
'small red onion',
'small cucumber',
'sliced pitted ripe olives',
'reduced-calorie Italian salad dressing',
'lemon juice',
'water',
'dried mint',
'crumbled Blue cheese']
我发现的最接近的模式是:
pattern = '[\s\d\.]* ([^\,]+).*'
但在测试中:
for ing in ingredients:
print(re.findall(pattern, ing))
返回每个测量缩写后的句点,例如:
['c. torn mixed salad greens']
而
pattern = '(?<=\. )[^.]*$'
无法捕获没有句点的实例,并且如果两者都出现则捕获逗号,即:
[]
['torn mixed salad greens']
[]
[]
['sliced pitted ripe olives']
['reduced-calorie Italian salad dressing']
['lemon juice']
['water']
['dried mint, crushed']
['crumbled Blue cheese']
先感谢您!
问题是您将数字与点配对。
\s\d*\.?
应该正确匹配数字(有或没有点)
您可以使用此模式:
for ing in ingredients:
print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())
图案细节:
([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
# or a comma
(?![^,]) # not followed by a character that is not a comma
# (in other words, followed by a comma or the end of the string)
(?i) # make the pattern case insensitive
我建议使用以下正则表达式来查找和替换您不感兴趣的子字符串。通过详细说明测量单位,这也将处理非缩写的度量单位。
\\s*(?:(?:(?:[0-9]\\s*)?[0-9]+\\/)?[0-9]+\\s*(?:(?:c\\.|cups?|tsp\\.|teaspoon|tbsp\\.|tablespoon)\\s*)?)|,.*|.*\\bor\\b
替换为:没有
现场演示
显示这将如何匹配
https://regex101.com/r/qV5iR8/3
样本字符串
注意最后一行的双重成分用or
分开,根据OP,他们想要消除第一种成分。
1 medium tomato, cut into 8 wedges
4 c. torn mixed salad greens
1/2 small red onion, sliced and separated into rings
1/4 small cucumber, sliced
1 1/4 c. sliced pitted ripe olives
2 Tbsp. reduced-calorie Italian salad dressing
2 Tbsp. lemon juice
1 Tbsp. water
1/2 tsp. dried mint, crushed
1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
更换后
medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
NODE EXPLANATION
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
c 'c'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
cup 'cup'
----------------------------------------------------------------------
s? 's' (optional (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tsp 'tsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
teaspoon 'teaspoon'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tbsp 'tbsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tablespoon 'tablespoon'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
or 'or'
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.