簡體   English   中英

正則表達式,使用非貪婪捕獲可選字符串

[英]Regular Expression, using non-greedy to catch optional string

我正在使用 PDFMiner 解析 PDF 的內容,有時,有一條線存在,有時則不存在。 我試圖表達可選行但沒有任何成功。 這是一段顯示問題的代碼:

#!/usr/bin/python3
# coding=UTF8

import re

# Simulate reading text of a PDF file with PDFMiner.
pdfContent = """

Blah blah.

Date:  2022-01-31

Optional line here which sometimes does not show

Amount:  123.45

2: Blah blah.

"""

RE = re.compile(
    r".*?"
    "Date:\s+(\S+).*?"
    "(Optional line here which sometimes does not show){0,1}.*?"
    "Amount:\s+(?P<amount>\S+)\n.*?"
    , re.MULTILINE | re.DOTALL)

matches = RE.match(pdfContent)

date     = matches.group(1)
optional = matches.group(2)
amount   = matches.group("amount")

print(f"date     = {date}")
print(f"optional = {optional}")
print(f"amount   = {amount}")

output 是:

date     = 2022-01-31
optional = None
amount   = 123.45

為什么是可選的None 請注意,如果我將{0,1}替換為{1} ,它可以工作,但是。 那么這條線不再是可選的了。 我確實使用非貪婪的.*? 到處...

這可能是重復的,但我搜索和搜索並沒有找到我的答案,因此這個問題。

您可以使用re.search (而不是re.match

Date:\s+(\S+)(?:.*?(Optional line here which sometimes does not show))?.*?Amount:\s+(?P<amount>\S+)

請參閱正則表達式演示

在這種模式中, .*?(Optional line here which sometimes does not show)? ( {0,1} = ? ) 被一個可選的非捕獲組(?:...)? ,那必須至少嘗試一次? 是一個貪心量詞。

在您的代碼中,您可以將其用作

RE = re.compile(
    r"Date:\s+(\S+)(?:.*?"
    r"(Optional line here which sometimes does not show))?.*?"
    r"Amount:\s+(?P<amount>\S+)",
    re.DOTALL)

matches = RE.search(pdfContent)

請參閱Python 演示

import re
 
pdfContent = "\n\nBlah blah.\n\nDate:  2022-01-31\n\nOptional line here which sometimes does not show\n\nAmount:  123.45\n\n2: Blah blah.\n"
 
RE = re.compile(
    r"Date:\s+(\S+)(?:.*?"
    r"(Optional line here which sometimes does not show))?.*?"
    r"Amount:\s+(?P<amount>\S+)",
    re.DOTALL)
 
matches = RE.search(pdfContent)
date     = matches.group(1)
optional = matches.group(2)
amount   = matches.group("amount")
 
print(f"date     = {date}")
print(f"optional = {optional}")
print(f"amount   = {amount}")

Output:

date     = 2022-01-31
optional = Optional line here which sometimes does not show
amount   = 123.45

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM