简体   繁体   English

使用scrapy将变量传递到Spider文件夹中的test.py

[英]Pass variable to test.py in spider folder using scrapy

I'm using Scrapy. 我正在使用Scrapy。 The following is the code for test.py in spider folder. 以下是Spider文件夹中test.py的代码。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem

class MySpider(BaseSpider):
    name = "craig"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://seattle.craigslist.org/npo/"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select("//span[@class='pl']")
        items = []
        for titles in titles:
            item = CraigslistSampleItem()
            item["title"] = titles.select("a/text()").extract()
            item["link"] = titles.select("a/@href").extract()
            items.append(item)
        return items

Essentially, I want to iterate my url list and pass url into MySpider class for start_ulrs . 从本质上讲,我想重复我的网址列表,并通过链接进入MySpiderstart_ulrs Could you anyone give me suggestion on how to make this? 有人可以给我建议如何做吗?

Instead of having "statically defined" start_urls you need to override start_requests() method: 无需“静态定义” start_urls您需要重写start_requests()方法:

from scrapy.http import Request

class MySpider(BaseSpider):
    name = "craig"
    allowed_domains = ["craigslist.org"]

    def start_requests(self)
        list_of_urls = [...]  # reading urls from a text file, for example
        for url in list_of_urls:
            yield Request(url)

    def parse(self, response):
        ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM