I'm performing a number of scraping and summary tasks and have found that newspaper works perfectly for my (most of) my needs. I have a series of pdf files I also need to look at and perform similar tasks with. I can find other apps to open and extract the stories from, hoping to just feed newspaper the text directly and get it to do its thing.... however, so far I have been unable to figure out how to do this. Any suggestions?
This is a great package that can deal with your predicament: pymupdf, see
https://pymupdf.readthedocs.io/en/latest/
Then run:
import fitz
and follow the docs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.