I'm really new to Python and have been using it in conjunction with Scrapy for making some web crawlers. When running a spider from the terminal I can use "-a NAME=VALUE" to set arguments, which is especially useful for directing it to different domains. I'm trying use the "domain" argument as a variable in another module but got stuck. Here's a portion of the module I'm trying to import the argument from:
class Spider(spiders.CrawlSpider):
name = 'changelog'
rules = (spiders.Rule(SgmlLinkExtractor(), callback='parse_item', follow=True),)
def __init__(self, domain='WHAT_IM_TRYING_TO_FIND', *args, **kwargs):
super(Spider, self).__init__(*args, **kwargs)
self.domain = domain
self.allowed_domains = [domain]
self.start_urls = [
'http://%s/' % domain,
'http://%s/index.html' % domain,
'http://%s/index.php' % domain,
]
In a separate module, trying things like
from MyModule import Spider
variable = Spider.domain
or
variable = __import __ ('MyModule').Spider.domain
gives me
Class 'Spider' has no 'domain' member
Any guidance will be greatly appreciated!
Scrapy's file structure looks like this:
myproject/
__init __.py
items.py
pipelines.py
settings.py
spiders/
__init __.py
spider.py
domain
is an attribute of instances of Spider, not of the Spider class. You can only access domain
if you have an instance of Spider
created somewhere.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.