简体   繁体   English

在Scrapy中使用规则和请求会引发异常TypeError:__init __()为关键字参数'callback'获得了多个值

[英]Using Rules and Requests in Scrapy throws exception TypeError: __init__() got multiple values for keyword argument 'callback'

I am a beginner in scrapy, and I have been trying to do the following workflow: Start from page A , which is a result search page containing links for full articles whose url end is a digit. 我是一名初学者,我一直在尝试执行以下工作流程:从A页开始, A页是结果搜索页,其中包含URL末尾为数字的完整文章的链接。 My intention is to grab each link of each result search page, access the links and scrap the full article. 我的目的是抓住每个结果搜索页面的每个链接,访问这些链接并删除全文。

I iterate over each page collecting links with the following rule: 我遍历每个页面,并使用以下规则收集链接:

   rules = (Rule(SgmlLinkExtractor(allow=(r'\d+',)), callback='parse_short_story',follow=True),)

Each ensures that the last digit of the search page iterates to the next one after I am done collecting the links and scrapping the full articles of the current page. 在我收集完链接并抓取当前页面的全部文章之后,每一个都确保搜索页面的最后一位迭代到下一位。

The parse_short_story method merely uses a select to filter the portion of the html page which, and afterwards loops over the remaining portion to acquire the links of the full stories and pass it on for the request: parse_short_story方法仅使用选择来过滤html页面的一部分,然后循环遍历其余部分以获取全文的链接并将其传递给请求:

for short_story in short_stories:
        item = DmozItem()

        full_story_link = short_story.select(".//h2/a/@href").extract()

        if full_story_link:
            yield Request(full_story_link, self.parse_full_story, callback='self.parse_full_story', errback=lambda _: item, meta=dict(item=item),)            

        items.append(item)
    return items     

On my understanding from the tutorial of scrapy, I need to return the items by the end of the parser methods, so that the rule properly append in a final list of items which I can throw in a json file or something else on running by the console. 根据我对scrapy 教程的理解,我需要在解析器方法的结尾处​​返回这些项目,以便该规则正确地附加在项目的最终列表中,然后我可以将其放入json文件或其他东西中,安慰。 Notice this portion below of Response and return calls which crashes. 请注意下面响应和回车的崩溃部分。 I can't figure out how to use both the Request and the return items. 我不知道如何同时使用Request和Return项目。

The method parse_full_story gets the response parameter like the parse_short_story does, and recover the item I send as parameter with 方法parse_full_storyparse_short_story一样获取response参数,并恢复我作为参数发送的项目

    item = response.meta.get('item')

After properly setting the information I desired on my item item I use return item . 正确设置我的项目我想要的信息后item我使用return item

In summary, 综上所述,

My expectation were that the rule would take care of moving along the search pages containing the links of the full article using the callback of parse_short_story , while for each link of each page, the parse_full_story would access the full articles of those links, scrap what I wanted, add to the item item, and exit, hopefully scanning all full articles in the end. 我的期望是,该规则将使用parse_short_story的回调在包含完整文章链接的搜索页面上移动,而对于每个页面的每个链接, parse_full_story将访问那些链接的完整文章,想要添加到item项目,然后退出,希望最后扫描所有完整的文章。

Apparently my understanding is wrong and I get the error: yield Request(full_story_link, self.parse_full_story, callback='self.parse_full_story', errback=lambda _: item, meta=dict(item=item),) exceptions.TypeError: __init__() got multiple values for keyword argument 'callback' You can find the full runable code here . 显然我的理解是错误的,并且得到了错误: yield Request(full_story_link, self.parse_full_story, callback='self.parse_full_story', errback=lambda _: item, meta=dict(item=item),) exceptions.TypeError: __init__() got multiple values for keyword argument 'callback'您可以在此处找到完整的可运行代码。 As it runs, you will see that it keeps throwing the exception. 运行时,您会看到它不断抛出异常。 If it is feasible to perform a direct fix and give/or a short explanation of what is wrong on this I would appreciate it, since similar problems lead me to Django associated questions on the web. 如果可以执行直接修复和/或对此做错什么做简短的解释,我将不胜感激,因为类似的问题使我想到了Django在网络上的相关问题。

Put only one callback parameter, use the self.parse_full_story ( Request() expects a callable; see here ) 仅放置一个回调参数,使用self.parse_full_storyRequest()需要可调用; 请参见此处

The "callback name string" version is only for Rules ( see here ) “回调名称字符串”版本仅适用于规则( 请参见此处

Use 采用

yield Request(full_story_link,
    self.parse_full_story,
    errback=lambda _: item,
    meta=dict(item=item),
)

instead of 代替

yield Request(full_story_link,
    self.parse_full_story, callback='self.parse_full_story',
    errback=lambda _: item,
    meta=dict(item=item),
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scrapy错误:TypeError:__init __()获得了意外的关键字参数“回调” - Scrapy error: TypeError: __init__() got an unexpected keyword argument 'callback' TypeError:__init __()为关键字参数“ choices”获得了多个值 - TypeError: __init__() got multiple values for keyword argument 'choices' TypeError:“ __ init __()为关键字参数'name'获得了多个值” - TypeError: “__init__() got multiple values for keyword argument 'name'” TypeError:__ init __()得到关键字参数'customer'的多个值 - TypeError: __init__() got multiple values for keyword argument 'customer' Scrapy错误:TypeError:__ init __()得到一个意外的关键字参数'deny' - Scrapy Error: TypeError: __init__() got an unexpected keyword argument 'deny' python请求:TypeError:__ init __()得到一个意外的关键字参数'proxies' - python requests: TypeError: __init__() got an unexpected keyword argument 'proxies' TypeError at '' __init__() 得到一个意外的关键字参数 '' - TypeError at '' __init__() got an unexpected keyword argument '' / __init __()的Django Rest Framework TypeError为关键字参数'read_only'获取了多个值 - Django Rest Framework TypeError at / __init__() got multiple values for keyword argument 'read_only' __init__() 为关键字参数“列”获得了多个值 - __init__() got multiple values for keyword argument 'columns' windrose:__init __()获得了多个关键字参数值 - windrose: __init__() got multiple values for keyword argument
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM