简体   繁体   English

如何获取 scrapy.http.request.Request?

[英]How do I fetch a scrapy.http.request.Request?

Note: I've gone through the scrapy tutorial , I'd just like to know how fetch works.注意:我已经阅读了scrapy 教程,我只想知道 fetch 是如何工作的。

With scrapy shell, this code works well.使用 scrapy shell,此代码运行良好。

>>> import scrapy
>>> url = 'http://quotes.toscrape.com/page/1/'
>>> def parse(response):
...     print('parse %s' % response)
... 
>>> req = scrapy.Request(url=url, callback=parse)
>>> fetch(req)

Which gets me这让我

2020-07-03 05:21:04 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2020-07-03 05:21:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
parse <200 http://quotes.toscrape.com/page/1/>

How do I run it in a.py file?如何在 .py 文件中运行它?

I put the code in fetch_req.py file and run the file with this command我将代码放在fetch_req.py文件中并使用此命令运行该文件

python fetch_req.py

then I got然后我得到了

NameError: name 'fetch' is not defined NameError:未定义名称“获取”

I understood this, as fetch is a method of scrapy.shell.Shell instance , so I added this to fetch_req.py .我明白这一点,因为fetchscrapy.shell.Shell instance的方法,所以我将其添加到fetch_req.py中。

from scrapy import shell
shell.Shell.fetch(req)

then I got然后我得到了

TypeError                                 Traceback (most recent call last)
<ipython-input-34-914d5e1bbfe3> in <module>()
----> 1 shell.Shell.fetch(req)

TypeError: fetch() missing 1 required positional argument: 'request_or_url'

I googled the error but got no hit.用谷歌搜索了错误,但没有成功。 How do I fix this?我该如何解决?

You will have to yield the request so that scrapy engine puts into it's queue and executes the request.您必须让yield请求,以便 scrapy 引擎将其放入队列并执行请求。

To do understand this better you should follow @Gallaecio suggestion and follow scrapy's tutorial .要更好地理解这一点,您应该遵循@Gallaecio 的建议并遵循scrapy 的教程 It's pretty straightforward.这很简单。

EDIT编辑

I understand now what you mean, however I don't understand why you want to use scrapy in this way.我现在明白你的意思了,但是我不明白你为什么要以这种方式使用 scrapy。 Surely fetch method wasn't desgined for this.当然fetch方法不是为此而设计的。

Anyhow the problem is that you are calling fetch as if it was a staticmethod.无论如何,问题是您正在调用 fetch ,就好像它是一个静态方法一样。

from scrapy import shell
shell.Shell.fetch(req)

You should instantiate the object and then call the method from the object.您应该实例化 object,然后从 object 调用该方法。 The missing required argument is raised because it expects self and request_or_url as arguments.缺少必需的参数是因为它期望selfrequest_or_url为 arguments。

You can also try passing the class as the first argument.您也可以尝试将 class 作为第一个参数传递。 (As if it was a classmethod) (好像它是一个类方法)

Those will likely lead to new exceptions though.不过,这些可能会导致新的例外情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM