Python-未定义全局名称

Question

I promise I have read through the other versions of this question, but I was unable to find a relevant one to my situation. 我保证我已阅读过该问题的其他版本，但无法找到与我的情况相关的一个。 If there is one, I apologize, I've been staring at this for a few hours now. 抱歉，如果有一个，我已经凝视了几个小时。

I've been toying with this a lot and actually got results on one version, so know it's close. 我一直在玩这个，实际上在一个版本上得到了结果，所以知道它已经接近了。

The 'start_URLs' variable is defined as a list prior to the function, but for some reason doesn't register on the global/module level. “ start_URLs”变量被定义为函数之前的列表，但由于某些原因未在全局/模块级别注册。

Here is the exact error: for listing_url_list in start_urls: NameError: global name 'start_urls' is not defined 这是确切的错误：对于start_urls中的listing_url_list：NameError：未定义全局名称“ start_urls”

import time
import scrapy
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.spiders import CrawlSpider, Rule
from scraper1.items import scraper1Item

from scraper1 import csvmodule

absolute_pos = './/*[@id="xpath"]/td/@class'

class spider1(CrawlSpider):
    name = 'ugh'
    allowed_domains = ["ugh.com"]
    start_urls = [
        "http://www.website.link.1",
        "http://www.website.link.2",
        "http://www.website.link.3"
    ]

    def parse(self, response):
        Select = Selector(response)
        listing_url_list = Select.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
        for listing_url_list in start_urls:
            yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

    def parselisting(self, response):
        ResultsDict = scraper1Item()
        Select = Selector(response)
        ResultsDict['absolute_pos'] = Select.xpath(absolute_pos).extract()
        ResultsDict['listing_url'] = response.url
        return ResultsDict

Answer 1

使用spider1.start_urls而不是start_urls 。

Answer 2

You need to fix your start_requests() method: 您需要修复start_requests()方法：

you meant to use listing_url_list instead of start_urls 您打算使用listing_url_list而不是start_urls
you meant to use listing_url instead of listing_url_list as a loop variable 您打算使用listing_url而不是listing_url_list作为循环变量
there is no need to instantiate Selector - use response.xpath() shortcut directly 无需实例化Selector -直接使用response.xpath()快捷方式

Fixed version: 固定版本：

def parse(self, response):
    listing_url_list = response.xpath('.//*[@id="xpath"]/li/div/a/@href').extract()
    for listing_url in listing_url_list:
        yield scrapy.Request(listing_url, callback=self.parselisting, dont_filter=True)

As a side note, I think you don't need CrawlSpider and can actually use a regular scrapy.Spider , since you are not actually using rules with link extractors. 附带说明一下，我认为您不需要CrawlSpider ，并且实际上可以使用常规的scrapy.Spider ，因为您实际上并未在链接提取器中使用rules 。

Python-未定义全局名称

问题描述

2 个解决方案

解决方案1
2 2016-07-09 03:38:33

解决方案2
2 已采纳 2016-07-09 03:46:03

Python-未定义全局名称

问题描述

2 个解决方案

解决方案1 2 2016-07-09 03:38:33

解决方案2 2 已采纳 2016-07-09 03:46:03

解决方案1
2 2016-07-09 03:38:33

解决方案2
2 已采纳 2016-07-09 03:46:03