簡體   English   中英

Python Scrapy | 如何將響應從蜘蛛傳遞給主 function

[英]Python Scrapy | How to pass the response to the main function from the spider

我曾嘗試廣泛搜索解決方案,但可能沒有使用正確的關鍵字。 I am aware that I can use the shell to work with CSS and XPath selectors immediately, but I would like to know if this is possible to do in the IDE environment outside of the spider class, namely in another cell.

示例代碼:

class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ["https://www.example.com"]
    
    def parse(self, response):
        URL = "www.example.com/1/"
        yield response

然后我希望能夠在另一個單元格中使用此響應和選擇器:

table_rows = response.xpath("//div[@class='example']/table/tr") # produces error
print(table_rows.xpath("td[4]//text()")[0] .get()

它產生錯誤: NameError: name 'response' is not defined

任何幫助/指導將不勝感激。

如果我理解正確,您希望蜘蛛返回響應並在主腳本中解析它?

主要.py:

from scrapy.crawler import CrawlerProcess, CrawlerRunner
from scrapy.utils.project import get_project_settings
from scrapy.signalmanager import dispatcher
from scrapy import signals


def spider_output(spider):
    output = []

    def get_output(item):
        output.append(item)

    dispatcher.connect(get_output, signal=signals.item_scraped)

    settings = get_project_settings()
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

    return output


if __name__ == "__main__":
    spider = "exampleSpider"
    response = spider_output(spider)
    response = response[0]['response']
    title = response.xpath('//h3//text()').get()
    price = response.xpath('//div[@class="card-body"]/h4/text()').get()

    print(f"Title: {title}")
    print(f"Price: {price}")

我們啟動蜘蛛並將生成的項目附加到output 由於output只有一個值,我們不必循環,只需取第一個值response[0] 然后我們想從鍵response中獲取值,所以response = response[0]['response']

蜘蛛.py:

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ['https://scrapingclub.com/exercise/detail_basic/']

    def parse(self, response):
        yield {'response': response}

在這里,我們返回一個帶有響應的項目。

步驟是:main->spider_output->spider->將響應項返回到spider_output->將項目附加到output列表->將output返回給main->從Z78E6221F6393D14CE6DZ中獲取響應->解析響應。

Output:

Title: Long-sleeved Jersey Top
Price: $12.99

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM