![](/img/trans.png)
[英]Scraping news articles into one single list with NewsPaper library in Python?
[英]Scraping multiple news article sources into one single list with NewsPaper library in Python?
親愛的 Stackoverflow 社區!
這是關於我在此處發布的上一個問題的后續問題。
我想將帶有 NewsPaper 庫的新聞報紙 URL 從多個來源中提取到一個列表中。 這對一個來源很有效,但是一旦我添加了第二個來源鏈接,它就只提取第二個來源的 URL。
import feedparser as fp
import newspaper
from newspaper import Article
website = {"cnn": {"link": "edition.cnn.com", "rss": "rss.cnn.com/rss/cnn_topstories.rss"}, "cnbc":{"link": "cnbc.com", "rss": "cnbc.com/id/10000664/device/rss/rss.html"}} A
for source, value in website.items():
if 'rss' in value:
d = fp.parse(value['rss'])
#if there is an RSS value for a company, it will be extracted into d
article_list = []
for entry in d.entries:
if hasattr(entry, 'published'):
article = {}
article['link'] = entry.link
article_list.append(article['link'])
print(article['link'])
輸出如下,僅附加了來自第二個來源的鏈接:
['https://www.cnbc.com/2019/10/23/why-china-isnt-cutting-lending-rates-like-the-rest-of-the-world.html', 'https://www.cnbc.com/2019/10/22/stocks-making-the-biggest-moves-after-hours-snap-texas-instruments-chipotle-and-more.html' , ...]
我希望將兩個來源的所有 URL 提取到列表中。 有誰知道這個問題的解決方案? 非常感謝您提前!!
article_list
在您的第一個for
循環中被覆蓋。 每次迭代新源時,您article_list
都會設置為一個新的空列表,從而有效地丟失來自先前源的所有信息。 這就是為什么最后你只有一個來源的信息,最后一個
您應該在開頭初始化article_list
而不是覆蓋它。
import feedparser as fp
import newspaper
from newspaper import Article
website = {"cnn": {"link": "edition.cnn.com", "rss": "rss.cnn.com/rss/cnn_topstories.rss"}, "cnbc":{"link": "cnbc.com", "rss": "cnbc.com/id/10000664/device/rss/rss.html"}} A
article_list = [] # INIT ONCE
for source, value in website.items():
if 'rss' in value:
d = fp.parse(value['rss'])
#if there is an RSS value for a company, it will be extracted into d
# article_list = [] THIS IS WHERE IT WAS BEING OVERWRITTEN
for entry in d.entries:
if hasattr(entry, 'published'):
article = {}
article['link'] = entry.link
article_list.append(article['link'])
print(article['link'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.