讀取多個文件並保存到 xls 列（pypdf2 和 xlsxwriterr

Question

我需要獲取一個包含多個 PDF 的目錄並將其結構化為一個 xls

但我不明白如何在目錄中制作列表將數據保存在xls中

enter import PyPDF2
    import xlsxwriter
#---------------------Input file-----------------------------------#
pdf_file = open('arquivo_file','rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
doc = read_pdf.getOutlines
page_content = page.extractText()
text = page_content.replace("\n", " ").replace("\t", " ").replace("  ", "")
content = page_content.split("\n")
data = content[0]
worksheet.write(1, 1, data)
workbook.close() here

Answer 1

通常，您的代碼將類似於這樣的內容。

import os
import glob

DIRPATH = "/path/to/your/pdf/directory"

# Get list of files with extension .pdf in a given directory
pdf_filepaths = glob.glob(os.path.join(DIRPATH, '*.pdf'))

# Loop over the pdf file-paths
# For each pdf-file:
#   1. read each pdf file
#   2. process the content you read (optional)
#   3. save the processed content to excel file
for i, pdf_filepath in enumerate(pdf_filepaths):
    content = read_pdf_file(pdf_filepath)
    content = process_data(content)
    write_excel_file(filename='out_{i}.xlsx', content=content)

在這里，我假設您將讀取、處理和寫入邏輯包裝在三個函數中：

def read_pdf_file(filepath):
   # your pdf reading logic goes here
   ...

   return content

def process_data(content):
   # your post-reading data-processing logic goes here
   ...

   return content

def write_excel_file(filepath, content):
   # your logic for writing to excel-file goes here
   ...

讀取多個文件並保存到 xls 列（pypdf2 和 xlsxwriterr

問題描述

1 個解決方案

解決方案1
1 2021-07-16 19:49:42

讀取多個文件並保存到 xls 列（pypdf2 和 xlsxwriterr

問題描述

1 個解決方案

解決方案1 1 2021-07-16 19:49:42

解決方案1
1 2021-07-16 19:49:42