简体   繁体   中英

Python Newspaper library results are inconsistent?

I'm using Anaconda3, installed newspaper. Seems simple enough, but the results are inconsistent.

http://newspaper.readthedocs.io/en/latest/

import newspaper
cnn_paper = newspaper.build('http://www.cnn.com')
for article in cnn_paper.articles:
    print(article.url)
print(cnn_paper.size())

This simple piece of code sometimes returns all results, other times it returns no result.

Anyone used this library or know a better library to scrape news websites? I prefer not to have to write a parser myself, but if it comes down to it, what should I use?

Found the FIx

https://github.com/codelucas/newspaper/issues/243

cnn_paper = newspaper.build('http://cnn.com', memoize_articles=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM