簡體   English   中英

Python 正則表達式:提取字符串中所有出現的 substring

[英]Python Regex: Extract all occurences of a substring within a string

我正在嘗試使用 Python Regex 在字符串中提取所有出現的 substring。 這是我嘗試過的:

import re
line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
m = re.findall(r'\d+x.*?[a-zA-Z]', line)
print (m)

我得到的 output 是['10x35c', '30x35c']

我想要實現的 output 是['10'x20'', '10x35cm', '30x35cm']

您可以在不使用regex的情況下使用split執行此操作:

In [1089]: m = [i.split(':')[1].strip() for i in line.split(',')]

In [1090]: m
Out[1090]: ["10'x20'", '10x35cm', '30x35cm']

您可以使用此正則表達式:

r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?"

正則表達式演示

代碼:

>>> import re
>>> line = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
>>> print (re.findall(r"\d+['\"]?x\d+['\"]?(?:\s*[a-zA-Z]+)?", line))
["10'x20'", '10x35cm', '30x35cm']

正則表達式詳細信息:

  • \d+ :匹配 1+ 個數字
  • ['\"]? : 匹配可選'"
  • x : 匹配字母x
  • \d+ :匹配 1+ 個數字
  • ['\"]? : 匹配可選'"
  • (?:\s*[a-zA-Z]+)? : 匹配包含 1+ 個字母的可選單元

利用

import re
string = "The dimensions of the first rectangle: 10'x20', second rectangle: 10x35cm, third rectangle: 30x35cm"
print(re.findall(r"""\d+'?x\d+'?(?: *[a-z]+)?""", string, re.I))

結果["10'x20'", '10x35cm', '30x35cm']

參見Python 證明 re.I代表不區分大小寫的匹配。

說明

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  '?                       '\'' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  x                        'x'
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  '?                       '\'' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
     *                       ' ' (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount possible))
--------------------------------------------------------------------------------
  )?                       end of grouping

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM