Python Newspaper library results are inconsistent?

Question

I'm using Anaconda3, installed newspaper. Seems simple enough, but the results are inconsistent.

http://newspaper.readthedocs.io/en/latest/

import newspaper
cnn_paper = newspaper.build('http://www.cnn.com')
for article in cnn_paper.articles:
    print(article.url)
print(cnn_paper.size())

This simple piece of code sometimes returns all results, other times it returns no result.

Anyone used this library or know a better library to scrape news websites? I prefer not to have to write a parser myself, but if it comes down to it, what should I use?

Answer 1

Found the FIx

https://github.com/codelucas/newspaper/issues/243

cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)

Python Newspaper library results are inconsistent?

Question

1 answers

solution1
1 2017-12-16 02:58:08

Python Newspaper library results are inconsistent?

Question

1 answers

solution1 1 2017-12-16 02:58:08

solution1
1 2017-12-16 02:58:08