简体   繁体   English

如何在pdfplumber中打开多个文件?

[英]how to open multiple files in pdfplumber?

I have multiple PDF files created with Access DB forms.我有多个使用 Access DB forms 创建的 PDF 文件。 The only way I can extract text from them is using pdfplumber.我可以从中提取文本的唯一方法是使用 pdfplumber。 Here is my code and it works perfectly for just 1 file.这是我的代码,它仅适用于 1 个文件。

import pdfplumber

with pdfplumber.open('CS_page_1.pdf') as pdf:
    page = pdf.pages[0]
    string = page.extract_text()
    file_name = string[43:48]
    print(file_name)

I need to use this extracted string to rename this file and the 100 other files in the folder.我需要使用这个提取的字符串来重命名这个文件和文件夹中的 100 个其他文件。 What would be the best way to do it?最好的方法是什么?

Would first build a list of all the pdfs in your folder using glob ( https://docs.python.org/3/library/glob.html ).首先使用 glob ( https://docs.python.org/3/library/glob.html ) 构建文件夹中所有 pdf 的列表。

Then iterate through each of them- pdfplumb them to obtain the desired string (which you want to rename the file to)- and then rename each individually ( https://www.tutorialspoint.com/python/os_rename.htm ).然后遍历它们中的每一个 - pdfplumb 它们以获得所需的字符串(您要将文件重命名为) - 然后单独重命名每个( https://www.tutorialspoint.com/python/os_rename.htm )。 Something like this:像这样的东西:

import glob
import pdfplumber
import os

arr_of_files = (glob.glob("/path/to/pdfs/*.pdf"))

for file in arr_of_files:
    with pdfplumber.open(file) as pdf:
        page = pdf.pages[0]
        string = page.extract_text()
        file_name = string[43:48]
        os.rename(file, file_name)
        
import pdfplumber
import glob
from tqdm.auto import tqdm
for current_pdf_file in tqdm(glob.glob("<pathname>\.pdf")):
    with pdfplumber.open(current_pdf_file) as my_pdf:
         # do other things here?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM