简体   繁体   中英

Use external file with Newspaper3k

I'm performing a number of scraping and summary tasks and have found that newspaper works perfectly for my (most of) my needs. I have a series of pdf files I also need to look at and perform similar tasks with. I can find other apps to open and extract the stories from, hoping to just feed newspaper the text directly and get it to do its thing.... however, so far I have been unable to figure out how to do this. Any suggestions?

This is a great package that can deal with your predicament: pymupdf, see

https://pymupdf.readthedocs.io/en/latest/

Then run:

import fitz

and follow the docs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM