簡體   English   中英

Python 3中的正則表達式:在數字或可選句點之后但在可選逗號之前匹配所有內容

[英]Regex in Python 3: match everything after a number or optional period but before an optional comma

我試圖從食譜中返回成分,沒有任何測量或指示。 成分是列表,如下所示:

['1  medium tomato, cut into 8 wedges',
 '4  c. torn mixed salad greens',
 '1/2  small red onion, sliced and separated into rings',
 '1/4  small cucumber, sliced',
 '1/4  c. sliced pitted ripe olives',
 '2  Tbsp. reduced-calorie Italian salad dressing',
 '2  Tbsp. lemon juice',
 '1  Tbsp. water',
 '1/2  tsp. dried mint, crushed',
 '1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']

我想返回以下列表:

['medium tomato',
 'torn mixed salad greens',
 'small red onion',
 'small cucumber',
 'sliced pitted ripe olives',
 'reduced-calorie Italian salad dressing',
 'lemon juice',
 'water',
 'dried mint',
 'crumbled Blue cheese']

我發現的最接近的模式是:

pattern = '[\s\d\.]* ([^\,]+).*'

但在測試中:

for ing in ingredients:
    print(re.findall(pattern, ing))

返回每個測量縮寫后的句點,例如:

['c. torn mixed salad greens']

pattern = '(?<=\. )[^.]*$'

無法捕獲沒有句點的實例,並且如果兩者都出現則捕獲逗號,即:

[]
['torn mixed salad greens']
[]
[]
['sliced pitted ripe olives']
['reduced-calorie Italian salad dressing']
['lemon juice']
['water']
['dried mint, crushed']
['crumbled Blue cheese']

先感謝您!

問題是您將數字與點配對。

\s\d*\.?

應該正確匹配數字(有或沒有點)

您可以使用此模式:

for ing in ingredients:
    print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())

圖案細節:

([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
                # or a comma
(?![^,]) # not followed by a character that is not a comma
         # (in other words, followed by a comma or the end of the string)
(?i)     # make the pattern case insensitive

描述

我建議使用以下正則表達式來查找和替換您不感興趣的子字符串。通過詳細說明測量單位,這也將處理非縮寫的度量單位。

\\s*(?:(?:(?:[0-9]\\s*)?[0-9]+\\/)?[0-9]+\\s*(?:(?:c\\.|cups?|tsp\\.|teaspoon|tbsp\\.|tablespoon)\\s*)?)|,.*|.*\\bor\\b

正則表達式可視化

替換為:沒有

例子

現場演示

顯示這將如何匹配

https://regex101.com/r/qV5iR8/3

樣本字符串

注意最后一行的雙重成分用or分開,根據OP,他們想要消除第一種成分。

1  medium tomato, cut into 8 wedges
4  c. torn mixed salad greens
1/2  small red onion, sliced and separated into rings
1/4  small cucumber, sliced
1 1/4  c. sliced pitted ripe olives
2  Tbsp. reduced-calorie Italian salad dressing
2  Tbsp. lemon juice
1  Tbsp. water
1/2  tsp. dried mint, crushed
1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese

更換后

medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese

說明

NODE                     EXPLANATION
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
        \s*                      whitespace (\n, \r, \t, \f, and " ")
                                 (0 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \/                       '/'
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture:
----------------------------------------------------------------------
        c                        'c'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        cup                      'cup'
----------------------------------------------------------------------
        s?                       's' (optional (matching the most
                                 amount possible))
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tsp                      'tsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        teaspoon                 'teaspoon'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tbsp                     'tbsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tablespoon               'tablespoon'
----------------------------------------------------------------------
      )                        end of grouping
----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  or                       'or'
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM