[英]Regex in Python 3: match everything after a number or optional period but before an optional comma
我試圖從食譜中返回成分,沒有任何測量或指示。 成分是列表,如下所示:
['1 medium tomato, cut into 8 wedges',
'4 c. torn mixed salad greens',
'1/2 small red onion, sliced and separated into rings',
'1/4 small cucumber, sliced',
'1/4 c. sliced pitted ripe olives',
'2 Tbsp. reduced-calorie Italian salad dressing',
'2 Tbsp. lemon juice',
'1 Tbsp. water',
'1/2 tsp. dried mint, crushed',
'1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']
我想返回以下列表:
['medium tomato',
'torn mixed salad greens',
'small red onion',
'small cucumber',
'sliced pitted ripe olives',
'reduced-calorie Italian salad dressing',
'lemon juice',
'water',
'dried mint',
'crumbled Blue cheese']
我發現的最接近的模式是:
pattern = '[\s\d\.]* ([^\,]+).*'
但在測試中:
for ing in ingredients:
print(re.findall(pattern, ing))
返回每個測量縮寫后的句點,例如:
['c. torn mixed salad greens']
而
pattern = '(?<=\. )[^.]*$'
無法捕獲沒有句點的實例,並且如果兩者都出現則捕獲逗號,即:
[]
['torn mixed salad greens']
[]
[]
['sliced pitted ripe olives']
['reduced-calorie Italian salad dressing']
['lemon juice']
['water']
['dried mint, crushed']
['crumbled Blue cheese']
先感謝您!
問題是您將數字與點配對。
\s\d*\.?
應該正確匹配數字(有或沒有點)
您可以使用此模式:
for ing in ingredients:
print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())
圖案細節:
([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
# or a comma
(?![^,]) # not followed by a character that is not a comma
# (in other words, followed by a comma or the end of the string)
(?i) # make the pattern case insensitive
我建議使用以下正則表達式來查找和替換您不感興趣的子字符串。通過詳細說明測量單位,這也將處理非縮寫的度量單位。
\\s*(?:(?:(?:[0-9]\\s*)?[0-9]+\\/)?[0-9]+\\s*(?:(?:c\\.|cups?|tsp\\.|teaspoon|tbsp\\.|tablespoon)\\s*)?)|,.*|.*\\bor\\b
替換為:沒有
現場演示
顯示這將如何匹配
https://regex101.com/r/qV5iR8/3
樣本字符串
注意最后一行的雙重成分用or
分開,根據OP,他們想要消除第一種成分。
1 medium tomato, cut into 8 wedges
4 c. torn mixed salad greens
1/2 small red onion, sliced and separated into rings
1/4 small cucumber, sliced
1 1/4 c. sliced pitted ripe olives
2 Tbsp. reduced-calorie Italian salad dressing
2 Tbsp. lemon juice
1 Tbsp. water
1/2 tsp. dried mint, crushed
1/4 c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese
更換后
medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese
NODE EXPLANATION
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
c 'c'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
cup 'cup'
----------------------------------------------------------------------
s? 's' (optional (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tsp 'tsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
teaspoon 'teaspoon'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tbsp 'tbsp'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tablespoon 'tablespoon'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
or 'or'
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.