On the Mac OS X platform, I would like to write a script, either in Python or Tcl to search for text within a PDF file and extract the relevant parts. I appreciate any help.
I am writing scripts to look inside a PDF to determine if it is a bill, from what company, and for what period. Based on these information, I rename the PDF and move it to an appropriate directory. For example, file such as Statement_03948293929384.pdf
might become 2012-07-15 Water Bill.pdf
and moved to my Utilities
folder.
pdf-parser.py
by Didier Stevens I have found a command-line tool called pdftotext written by Glyph & Cog, LLC; built and packaged by Carsten Bluem . This tool is straight forward and it solves my problem. I am still looking out for those tools that can search PDF directly, without having to convert to text file.
I have successfully used PyODConverter to convert to/from PDFs (there is also a more powerful Java version). Once you have the PDF converted to text it should be trivial to do the searching. Also I believe iText should be capable of doing similar things, but I haven't tested it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.