如何使用pdfplumber提取的文本在Python中打印下一行

Question

如何使用 pdfPlumber extract.text 函數從我從 PDF 中提取的文本中打印下一行？

我試過 line.next() 但它不起作用。

實際作業名稱在“作業名稱”之后的行上。 按照下面的例子。

職位名稱

奧爾巴尼購物中心發展

我的代碼如下。

jobName_re = re.compile(r'(Job Name)')
siteAddress_re = re.compile(r'(Wellington\s)(.+)')
file = 'invoices.pdf'

lines = []

with pdfplumber.open(file) as myPdf:
    for page in myPdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            jobName = jobName_re.search(line)
            siteAddress = siteAddress_re.search(line)
            if jobName:
                print('The next line that follows Job Name is', line.next())
            elif siteAddress:
                print(siteAddress.group(1))

Answer 1

您有多種選擇。

選項1

您可以切換到使用整數索引來循環記錄：

lines = text.split('\n')
for i in range(len(lines)):
    line = lines[i]

然后您可以訪問lines[i+1] 。

選項 2

設置一個標志，表示您已經看到作業名稱的標題，然后在下一次循環時選擇它。 像這樣的東西：

        last_was_job_heading = False
        for line in text.split('\n'):
            siteAddress = siteAddress_re.search(line)
            if last_was_job_heading:
                print('The next line that follows Job Name is', line)
            elif siteAddress:
                print(siteAddress.group(1))
            last_was_job_heading = jobName_re.search(line)

選項 3

根本不要將文本分成幾行。 而是使用更智能的正則表達式一次解析多行。

選項 4

使用某種解析庫而不是正則表達式。 在這種簡單的情況下，這可能有點矯枉過正。

如何使用pdfplumber提取的文本在Python中打印下一行

問題描述

1 個解決方案

解決方案1
1 已采納 2021-07-17 10:19:54

選項1

選項 2

選項 3

選項 4

如何使用pdfplumber提取的文本在Python中打印下一行

問題描述

1 個解決方案

解決方案1 1 已采納 2021-07-17 10:19:54

選項1

選項 2

選項 3

選項 4

解決方案1
1 已采納 2021-07-17 10:19:54