簡體   English   中英

讀取多個文件並保存到 xls 列(pypdf2 和 xlsxwriterr

[英]read multiple files and save to xls in columns (pypdf2 and xlsxwriterr

我需要獲取一個包含多個 PDF 的目錄並將其結構化為一個 xls

但我不明白如何在目錄中制作列表將數據保存在xls中

enter import PyPDF2
    import xlsxwriter
#---------------------Input file-----------------------------------#
pdf_file = open('arquivo_file','rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
doc = read_pdf.getOutlines
page_content = page.extractText()
text = page_content.replace("\n", " ").replace("\t", " ").replace("  ", "")
content = page_content.split("\n")
data = content[0]
worksheet.write(1, 1, data)
workbook.close() here

通常,您的代碼將類似於這樣的內容。

import os
import glob

DIRPATH = "/path/to/your/pdf/directory"

# Get list of files with extension .pdf in a given directory
pdf_filepaths = glob.glob(os.path.join(DIRPATH, '*.pdf'))

# Loop over the pdf file-paths
# For each pdf-file:
#   1. read each pdf file
#   2. process the content you read (optional)
#   3. save the processed content to excel file
for i, pdf_filepath in enumerate(pdf_filepaths):
    content = read_pdf_file(pdf_filepath)
    content = process_data(content)
    write_excel_file(filename='out_{i}.xlsx', content=content)

在這里,我假設您將讀取、處理和寫入邏輯包裝在三個函數中:

def read_pdf_file(filepath):
   # your pdf reading logic goes here
   ...

   return content

def process_data(content):
   # your post-reading data-processing logic goes here
   ...

   return content

def write_excel_file(filepath, content):
   # your logic for writing to excel-file goes here
   ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM