简体   繁体   English

Python-未定义全局名称

[英]Python - Global Name is not defined

I promise I have read through the other versions of this question, but I was unable to find a relevant one to my situation. 我保证我已阅读过该问题的其他版本,但无法找到与我的情况相关的一个。 If there is one, I apologize, I've been staring at this for a few hours now. 抱歉,如果有一个,我已经凝视了几个小时。

I've been toying with this a lot and actually got results on one version, so know it's close. 我一直在玩这个,实际上在一个版本上得到了结果,所以知道它已经接近了。

The 'start_URLs' variable is defined as a list prior to the function, but for some reason doesn't register on the global/module level. “ start_URLs”变量被定义为函数之前的列表,但由于某些原因未在全局/模块级别注册。

Here is the exact error: for listing_url_list in start_urls: NameError: global name 'start_urls' is not defined 这是确切的错误:对于start_urls中的listing_url_list:NameError:未定义全局名称“ start_urls”

import time
import scrapy
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.spiders import CrawlSpider, Rule
from scraper1.items import scraper1Item

from scraper1 import csvmodule

absolute_pos = './/*[@id="xpath"]/td/@class'

class spider1(CrawlSpider):
    name = 'ugh'
    allowed_domains = ["ugh.com"]
    start_urls = [
        "http://www.website.link.1",
        "http://www.website.link.2",
        "http://www.website.link.3"
    ]

    def parse(self, response):
        Select = Selector(response)
        listing_url_list = Select.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
        for listing_url_list in start_urls:
            yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

    def parselisting(self, response):
        ResultsDict = scraper1Item()
        Select = Selector(response)
        ResultsDict['absolute_pos'] = Select.xpath(absolute_pos).extract()
        ResultsDict['listing_url'] = response.url
        return ResultsDict

使用spider1.start_urls而不是start_urls

You need to fix your start_requests() method: 您需要修复start_requests()方法:

  • you meant to use listing_url_list instead of start_urls 您打算使用listing_url_list而不是start_urls
  • you meant to use listing_url instead of listing_url_list as a loop variable 您打算使用listing_url而不是listing_url_list作为循环变量
  • there is no need to instantiate Selector - use response.xpath() shortcut directly 无需实例化Selector -直接使用response.xpath()快捷方式

Fixed version: 固定版本:

def parse(self, response):
    listing_url_list = response.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
    for listing_url in listing_url_list:
        yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

As a side note, I think you don't need CrawlSpider and can actually use a regular scrapy.Spider , since you are not actually using rules with link extractors. 附带说明一下,我认为您不需要CrawlSpider ,并且实际上可以使用常规的scrapy.Spider ,因为您实际上并未在链接提取器中使用rules

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM