繁体   English   中英

使用scrapy将变量传递到Spider文件夹中的test.py

[英]Pass variable to test.py in spider folder using scrapy

我正在使用Scrapy。 以下是Spider文件夹中test.py的代码。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem

class MySpider(BaseSpider):
    name = "craig"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://seattle.craigslist.org/npo/"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select("//span[@class='pl']")
        items = []
        for titles in titles:
            item = CraigslistSampleItem()
            item["title"] = titles.select("a/text()").extract()
            item["link"] = titles.select("a/@href").extract()
            items.append(item)
        return items

从本质上讲,我想重复我的网址列表,并通过链接进入MySpiderstart_ulrs 有人可以给我建议如何做吗?

无需“静态定义” start_urls您需要重写start_requests()方法:

from scrapy.http import Request

class MySpider(BaseSpider):
    name = "craig"
    allowed_domains = ["craigslist.org"]

    def start_requests(self)
        list_of_urls = [...]  # reading urls from a text file, for example
        for url in list_of_urls:
            yield Request(url)

    def parse(self, response):
        ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM