简体   繁体   English

批量导出PDF属性

[英]Batch Export PDF Properties

TL;DR TL; DR

I'm looking to take a file directory full of PDF files and "export" their properties, specifically the page number, to a .CSV file. 我正在寻找一个充满PDF文件的文件目录,并将其属性(尤其是页码)“导出”到.CSV文件。


Research 研究

I have found numerous programs that let me batch export the meta data of the PDF, but this typically has to do with the source information and not the information available about the PDF itself. 我发现有很多程序可以批量导出PDF的元数据,但这通常与源信息有关,而与有关PDF本身的信息无关。


Details 细节

I need the page numbers to be able to deduce the order of pages. 我需要页码才能推断出页面顺序。 I'm using for an indexing system that will allow two parties to locate and communicate about the documents. 我正在使用一个索引系统,该系统将允许两方查找并交流有关文档的信息。 I plan to have an Excel document with the document titles and unique IDs that will need to correspond to sequential bates numbers on PDFs. 我计划制作一个包含文档标题和唯一ID的Excel文档,该文档标题和ID必须与PDF上的顺序贝茨编号相对应。

I don't mind coding or getting extensively creative with this, but it has to be something that can be done in batch as there are many many files. 我不介意对此进行编码或使之具有广泛的创造性,但是由于文件很多,它必须可以批量完成。

Thank you in advance for any help you can provide. 预先感谢您提供的任何帮助。

You said you don't mind coding, so here's a short Python script that does what you want (as I understand it). 您说过您不介意编码,所以这是一个简短的Python脚本,可以满足您的需要(据我所知)。

#!python3.6
import csv
import os

import fitz  # http://pymupdf.readthedocs.io/en/latest/


def main():
    """ Place script in same directory as PDFs. """
    script_dir = os.path.dirname(os.path.abspath(__file__))
    csv_filename = os.path.join(script_dir, 'pdf_information.csv')
    with open(csv_filename, mode='w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([
            'Filename',
            'Page Count',
        ])
        for basename in os.listdir(script_dir):
            if basename.upper().endswith('.PDF'):
                filename = os.path.join(script_dir, basename)
                pdf = fitz.open(filename)
                writer.writerow([
                    basename,
                    pdf.pageCount,
                ])
                pdf.close()


if __name__ == '__main__':
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM