简体   繁体   中英

Python/Scrapy: Callback function never calls

I am using scrapy to crawl google play items profiles but callback function is not executed. I can't find problem in code (no errors). Can you tell me any solution?

# -*- coding: utf-8 -*-
import scrapy

import time

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from playcrawl.items import PlaycrawlItem
from scrapy.http import Request

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractor import LinkExtractor

class GoogleplaySpider(CrawlSpider):
    name = 'googleplay'
    allowed_domains = ['play.google.com']
    start_urls = ['https://play.google.com/store/apps/category/GAME']

    rules = (
        Rule(LinkExtractor(allow=('/store/apps'))),
        Rule(LinkExtractor(allow=('/store/apps/details\?')),callback="parse_item")
        )

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)

        item = PlaycrawlItem()
        item["pub"] = hxs.select('//a[@class = "document-subtitle primary"]/span[1]').select("text()").extract()
        item["email"] = hxs.select('//a[contains(@class, "dev-link") and starts-with(@href, "mailto")]').select("@href").extract()[0][7:]

        f = open("D:\\_scrapy\\playcrawl\\data_emails.txt", "a")
        f.write(item["email"] + "\n")
        f.close()

        print("\n\n\n\n" + item["email"] + "\n\n\n\n")
        time.sleep(0)

        return item #yield item

I have tested your code, and the reason is easy.

The spider just hasn't matched the second Rule .

Try this:

rules = (
        Rule(LinkExtractor(allow=('/store/apps')),callback="parse_item"),
        Rule(LinkExtractor(allow=('/store/apps/details\?')),callback="parse_item")
        )

And then it works, so there is no bug about your code but about your logic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM