简体   繁体   English

Python / Scrapy:回调函数永不调用

[英]Python/Scrapy: Callback function never calls

I am using scrapy to crawl google play items profiles but callback function is not executed. 我正在使用scrapy来爬网Google Play项目配置文件,但未执行回调函数。 I can't find problem in code (no errors). 我在代码中找不到问题(没有错误)。 Can you tell me any solution? 你能告诉我任何解决方案吗?

# -*- coding: utf-8 -*-
import scrapy

import time

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from playcrawl.items import PlaycrawlItem
from scrapy.http import Request

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractor import LinkExtractor

class GoogleplaySpider(CrawlSpider):
    name = 'googleplay'
    allowed_domains = ['play.google.com']
    start_urls = ['https://play.google.com/store/apps/category/GAME']

    rules = (
        Rule(LinkExtractor(allow=('/store/apps'))),
        Rule(LinkExtractor(allow=('/store/apps/details\?')),callback="parse_item")
        )

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)

        item = PlaycrawlItem()
        item["pub"] = hxs.select('//a[@class = "document-subtitle primary"]/span[1]').select("text()").extract()
        item["email"] = hxs.select('//a[contains(@class, "dev-link") and starts-with(@href, "mailto")]').select("@href").extract()[0][7:]

        f = open("D:\\_scrapy\\playcrawl\\data_emails.txt", "a")
        f.write(item["email"] + "\n")
        f.close()

        print("\n\n\n\n" + item["email"] + "\n\n\n\n")
        time.sleep(0)

        return item #yield item

I have tested your code, and the reason is easy. 我已经测试了您的代码,原因很简单。

The spider just hasn't matched the second Rule . 蜘蛛只是没有符合第二Rule

Try this: 尝试这个:

rules = (
        Rule(LinkExtractor(allow=('/store/apps')),callback="parse_item"),
        Rule(LinkExtractor(allow=('/store/apps/details\?')),callback="parse_item")
        )

And then it works, so there is no bug about your code but about your logic. 然后它起作用了,所以没有关于代码的错误,而是关于逻辑的错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM