使用Scrapy搜尋相關新聞

Question

我想使用Scrapy刪除Snopes事實檢查網站。 在這里，我想根據用戶給定的輸入找出相關新聞。 用戶輸入一個單詞，Scrapy爬蟲將返回相關新聞。 例如，如果我輸入NASA作為輸入，Scrapy將提供與NASA相關的新聞。 我試過了，但是沒有輸出。

import scrapy

class fakenews(scrapy.Spider):
    name = "snopes5"
    allowed_domains = ["snopes.com"]
    start_urls = [
            "https://www.snopes.com/fact-check/category/science/"
    ]

    def parse(self, response):
            name1=input('Please Enter the search item you want for fake news: ')
            headers = response.xpath('//div[@class="media-body"]/h5').extract()
            headers = [c.strip().lower() for c in headers]
            if name1 in headers:
                print(response.xpath('//div[@class="navHeader"]/ul'))
                filename = response.url.split("/")[-2] + '.html'
                with open(filename, 'wb') as f:
                    f.write(response.body)

Answer 1

您的代碼中存在一個重大錯誤：

c=response.xpath('//div[@class="navHeader"]/ul')
if name1 in c:
    ...

這里c最終是一個SelectorList對象，並且您正在檢查字符串name是否在SelectorList對象中，該name當然始終為False 。
為了解決這個問題，您需要提取值：

c=response.xpath('//div[@class="navHeader"]/ul').extract()
                                                ^^^^^^^^^^

另外，您可能希望處理這些值以使匹配更加不穩定：

headers = response.xpath('//div[@class="navHeader"]/ul').extract()
headers = [c.strip().lower() for c in headers]
if name1 in headers:
    ...

上面的代碼將忽略尾部和前導空格，並使所有內容都變為小寫，以區分大小寫。

您的用例示例：

headers = sel.xpath('//div[@class="media-body"]/h5/text()').extract() 
headers = [c.strip().lower() for c in headers]  
for header in headers: 
    if 'gorilla' in header: 
        print(f'yay matching header: "{header}"')

輸出：

yay matching header: "did this gorilla learn how to knit?"

使用Scrapy搜尋相關新聞

問題描述

1 個解決方案

解決方案1
1 已采納 2019-04-05 03:17:39

使用Scrapy搜尋相關新聞

問題描述

1 個解決方案

解決方案1 1 已采納 2019-04-05 03:17:39

解決方案1
1 已采納 2019-04-05 03:17:39