如何從 python 中的“產量”中獲取結果？

Question

也許 Python 的yield對某些人來說是補救措施，但對我來說不是……至少現在還沒有。 我了解yield創建了一個“發電機”。

當我決定學習 scrapy 時，我偶然發現了yield 。 我為 Spider 編寫了一些代碼，其工作方式如下：

Go 開始超鏈接並提取所有超鏈接 - 這不是完整的超鏈接，只是連接到起始超鏈接的子目錄
檢查超鏈接將滿足特定標准的超鏈接附加到基本超鏈接
使用 Request 導航到新的超鏈接並解析以在具有“onclick”的元素中查找唯一 ID

import scrapy

class newSpider(scrapy.Spider)
    name = 'new'
    allowed_domains = ['www.alloweddomain.com']
    start_urls = ['https://www.alloweddomain.com']

    def parse(self, response)
        links = response.xpath('//a/@href').extract()
        for link in links:
            if link == 'SpecificCriteria':
                next_link = response.urljoin(link)
                yield Request(next_link, callback=self.parse_new)

編輯1：

                for uid_dict in self.parse_new(response):
                   print(uid_dict['uid'])
                   break

結束編輯 1

在此處運行代碼會將response評估為 HTTP 對start_urls的響應，而不是對next_link的響應。

    def parse_new(self, response)
        trs = response.xpath("//*[@class='unit-directory-row']").getall()
        for tr in trs:
            if 'SpecificText' in tr:
                elements = tr.split()
                for element in elements:
                    if 'onclick' in element:
                        subelement = element.split('(')[1]
                        uid = subelement.split(')')[0]
                        print(uid)
                        yield {
                            'uid': uid
                        }
                break

它起作用了，scrapy 抓取第一頁，創建新的超鏈接並導航到下一頁。 new_parser 為 uid 解析 HTML 並“生成”它。 scrapy的引擎顯示正確的uid是'yielded'。

我不明白的是如何“使用”由 parse_new 獲得的 uid 來創建和導航到新的超鏈接，就像我想要一個變量一樣，我似乎無法使用Request返回一個變量。

Answer 1

我會檢查“yield”關鍵字的作用是什么？ 很好地解釋了yield的工作原理。

同時， spider.parse_new(response)是一個可迭代的 object。 也就是說，您可以通過for循環獲取其產生的結果。 例如，

for uid_dict in spider.parse_new(response):
    print(uid_dict['uid'])

Answer 2

經過大量閱讀和學習，我發現了 scrapy 在第一次解析中不執行回調的原因，它與 yield 無關：它與兩個問題有很大關系：

1） robots.txt 。 可以在 settings.py 中使用ROBOTSTXT_OBEY = False來“解決”鏈接

2) 記錄器已將Filtered offsite request to . 鏈接dont_filter=True可以解決這個問題。

如何從 python 中的“產量”中獲取結果？

問題描述

2 個解決方案

解決方案1
0 2020-05-12 19:45:53

解決方案2
0 已采納 2020-05-18 10:42:44

如何從 python 中的“產量”中獲取結果？

問題描述

2 個解決方案

解決方案1 0 2020-05-12 19:45:53

解決方案2 0 已采納 2020-05-18 10:42:44

解決方案1
0 2020-05-12 19:45:53

解決方案2
0 已采納 2020-05-18 10:42:44