Web 抓取新聞文章和關鍵字搜索

Question

我有一個代碼可以獲取網頁中新聞文章的標題。 我使用了一個 for 循環，在其中我獲得了 4 個新聞網站的標題。 我還實現了一個單詞搜索，它告訴我們使用“冠狀病毒”這個詞的文章的數量。 我想要單詞搜索，它可以告訴我每個網站中帶有“冠狀病毒”一詞的文章數量。 現在，我得到的 output 是所有網站中使用“冠狀病毒”這個詞的次數。 請幫助我，我必須盡快提交這個項目。 以下是代碼：

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
from newspaper import Article
import requests
URL=["https://www.timesnownews.com/coronavirus","https://www.indiatoday.in/coronavirus", "https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
for url in URL:
    parser = 'html.parser'  
    resp = requests.get(url)
    http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
    html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
    encoding = html_encoding or http_encoding
    soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
    
    links = []
    for link in soup.find_all('a', href=True):
        if "javascript" in link["href"]:
            continue
        links.append(link['href'])
            
    count = 0
     
            
    for link in links:
        try:
            article = Article(link)
            article.download()
            article.parse()
            print(article.title)
            if "COVID" in article.title or "coronavirus" in article.title or "Coronavirus"in article.title or "Covid-19" in article.title or "COVID-19" in article.title :
                    count += 1
    
        except:
            pass
         
        
print(" number of articles with the word COVID:")
print(count)

Answer 1

實際上，您只獲得最后一個站點計數。 如果你想得到那么所有的，append 它到一個列表中，然后你可以打印每個站點的計數。

首先創建一個空列表和 append 每次迭代的最終計數：

URL = ["https://www.timesnownews.com/coronavirus", "https://www.indiatoday.in/coronavirus",
       "https://www.ndtv.com/coronavirus?pfrom=home-mainnavigation"]
Url_count = []

for url in URL:
    parser = 'html.parser'
    ...
    ...
        except:
            pass

    Url_count.append(count)

然后可以使用zip打印結果：

for url, count in zip(URL, Url_count):
    print("Site:", url, "Count:", count)

Web 抓取新聞文章和關鍵字搜索

問題描述

1 個解決方案

解決方案1
1 已采納 2020-12-02 16:29:39

Web 抓取新聞文章和關鍵字搜索

問題描述

1 個解決方案

解決方案1 1 已采納 2020-12-02 16:29:39

解決方案1
1 已采納 2020-12-02 16:29:39