繁体   English   中英

Python 3中的正则表达式:在数字或可选句点之后但在可选逗号之前匹配所有内容

[英]Regex in Python 3: match everything after a number or optional period but before an optional comma

我试图从食谱中返回成分,没有任何测量或指示。 成分是列表,如下所示:

['1  medium tomato, cut into 8 wedges',
 '4  c. torn mixed salad greens',
 '1/2  small red onion, sliced and separated into rings',
 '1/4  small cucumber, sliced',
 '1/4  c. sliced pitted ripe olives',
 '2  Tbsp. reduced-calorie Italian salad dressing',
 '2  Tbsp. lemon juice',
 '1  Tbsp. water',
 '1/2  tsp. dried mint, crushed',
 '1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']

我想返回以下列表:

['medium tomato',
 'torn mixed salad greens',
 'small red onion',
 'small cucumber',
 'sliced pitted ripe olives',
 'reduced-calorie Italian salad dressing',
 'lemon juice',
 'water',
 'dried mint',
 'crumbled Blue cheese']

我发现的最接近的模式是:

pattern = '[\s\d\.]* ([^\,]+).*'

但在测试中:

for ing in ingredients:
    print(re.findall(pattern, ing))

返回每个测量缩写后的句点,例如:

['c. torn mixed salad greens']

pattern = '(?<=\. )[^.]*$'

无法捕获没有句点的实例,并且如果两者都出现则捕获逗号,即:

[]
['torn mixed salad greens']
[]
[]
['sliced pitted ripe olives']
['reduced-calorie Italian salad dressing']
['lemon juice']
['water']
['dried mint, crushed']
['crumbled Blue cheese']

先感谢您!

问题是您将数字与点配对。

\s\d*\.?

应该正确匹配数字(有或没有点)

您可以使用此模式:

for ing in ingredients:
    print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())

图案细节:

([a-z][^.,]*) # a substring that starts with a letter and that doesn't contain a period
                # or a comma
(?![^,]) # not followed by a character that is not a comma
         # (in other words, followed by a comma or the end of the string)
(?i)     # make the pattern case insensitive

描述

我建议使用以下正则表达式来查找和替换您不感兴趣的子字符串。通过详细说明测量单位,这也将处理非缩写的度量单位。

\\s*(?:(?:(?:[0-9]\\s*)?[0-9]+\\/)?[0-9]+\\s*(?:(?:c\\.|cups?|tsp\\.|teaspoon|tbsp\\.|tablespoon)\\s*)?)|,.*|.*\\bor\\b

正则表达式可视化

替换为:没有

例子

现场演示

显示这将如何匹配

https://regex101.com/r/qV5iR8/3

样本字符串

注意最后一行的双重成分用or分开,根据OP,他们想要消除第一种成分。

1  medium tomato, cut into 8 wedges
4  c. torn mixed salad greens
1/2  small red onion, sliced and separated into rings
1/4  small cucumber, sliced
1 1/4  c. sliced pitted ripe olives
2  Tbsp. reduced-calorie Italian salad dressing
2  Tbsp. lemon juice
1  Tbsp. water
1/2  tsp. dried mint, crushed
1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese

更换后

medium tomato
torn mixed salad greens
small red onion
small cucumber
sliced pitted ripe olives
reduced-calorie Italian salad dressing
lemon juice
water
dried mint
crumbled Blue cheese

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
        \s*                      whitespace (\n, \r, \t, \f, and " ")
                                 (0 or more times (matching the most
                                 amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \/                       '/'
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      (?:                      group, but do not capture:
----------------------------------------------------------------------
        c                        'c'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        cup                      'cup'
----------------------------------------------------------------------
        s?                       's' (optional (matching the most
                                 amount possible))
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tsp                      'tsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        teaspoon                 'teaspoon'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tbsp                     'tbsp'
----------------------------------------------------------------------
        \.                       '.'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tablespoon               'tablespoon'
----------------------------------------------------------------------
      )                        end of grouping
----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  or                       'or'
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM