Use external file with Newspaper3k

Question

I'm performing a number of scraping and summary tasks and have found that newspaper works perfectly for my (most of) my needs. I have a series of pdf files I also need to look at and perform similar tasks with. I can find other apps to open and extract the stories from, hoping to just feed newspaper the text directly and get it to do its thing.... however, so far I have been unable to figure out how to do this. Any suggestions?

Answer 1

This is a great package that can deal with your predicament: pymupdf, see

https://pymupdf.readthedocs.io/en/latest/

Then run:

import fitz

and follow the docs.

Use external file with Newspaper3k

Question

1 answers

solution1
0 2022-04-13 22:41:24

Use external file with Newspaper3k

Question

1 answers

solution1 0 2022-04-13 22:41:24

solution1
0 2022-04-13 22:41:24