简体   繁体   English

错误:尝试使用scrappy登录时发生ValueError(“在%s中找不到元素”%响应)

[英]Error: raise ValueError(“No element found in %s” % response) occur when try to login with scrappy

Problem Description: 问题描述:

I want to crawl some info from the bbs of my college. 我想从我大学的bbs中抓取一些信息。 Here is the address: http://bbs.byr.cn Below is the code of my spider: 这是地址: http//bbs.byr.cn下面是我的蜘蛛的代码:

from lxml import etree
import scrapy
try:
from scrapy.spiders import Spider
except:
from scrapy.spiders import BaseSpider as Spider
from scrapy.http import Request

class ITJobInfoSpider(scrapy.Spider):
name = "ITJobInfoSpider"
start_urls = ["http://bbs.byr.cn/#!login"]

def parse(self,response):
    return scrapy.FormRequest.from_response(
        response,
        formdata={'method':'post','id': 'username', 'passwd':'password'},
        formxpath='//form[@action="/login"]',
        callback=self.after_login
)

def after_login(self,response):
    print "######response body: " + response.body +"\n"
    if "authentication failed" in response.body:
        print "#######Login failed#########\n"
    return

However, with this code, I often get an Error: raise ValueError("No element found in %s" % response) 但是,使用此代码,我经常会收到错误:引发ValueError(“在%s中找不到元素”%响应)

My Investigation: 我的调查:

I find that this Error happens when scrapy try to parse the HTML code of the url: http://bbs.byr.cn , scrappy parses the page with lxml. 我发现当scrapy尝试解析url的HTML代码时会发生此错误: http ://bbs.byr.cn,scrappy用lxml解析页面。 Below is the code 下面是代码

root = LxmlDocument(response, lxml.html.HTMLParser)
forms = root.xpath('//form')
if not forms:
    raise ValueError("No <form> element found in %s" % response)

So I look into the code with the code: print etree.tostring(root) And find that HTML element: </form> is parsed into &lt;/form&gt; 所以我用代码查看代码: print etree.tostring(root)并发现HTML元素: </form>被解析为&lt;/form&gt; no wonder the code forms = root.xpath('//form') will return an empty forms list. 难怪代码forms = root.xpath('//form')将返回一个空表单列表。

But I don't know why this is happening, maybe the HTML code encoding? 但我不知道为什么会这样,也许是HTML代码编码? (The HTML code is encoded with GBK not UTF8.) Thanks advance for anyone who can help me out? (HTML代码使用GBK编码,而不是UTF8。)感谢任何可以帮助我的人吗? BTW, if anyone want to write code against the website, I can give you an test account, pls leave me an email address in the comment. 顺便说一句,如果有人想要针对网站编写代码,我可以给你一个测试帐户,请在评论中留下我的电子邮件地址。

Thanks a lot, guys!! 非常感谢,伙计们!

There seems to be some JavaScript redirection happening. 似乎有一些JavaScript重定向发生。

In this case using Splash would be overkill, though. 在这种情况下,使用Splash会有点矫枉过正。 Simply append /index to the start URL: http://bbs.byr.cn → http://bbs.byr.cn/index 只需将/index附加到起始URL: http://bbs.byr.cn → http://bbs.byr.cn/indexhttp://bbs.byr.cn → http://bbs.byr.cn/indexhttp://bbs.byr.cn → http://bbs.byr.cn/index

This would be the complete working spider: 这将是完整的工作蜘蛛:

from scrapy import Spider
from scrapy.http import FormRequest

class ByrSpider(Spider):
    name = 'byr'
    start_urls = ['http://bbs.byr.cn/index']

    def parse(self, response):
        return FormRequest.from_response(
            response,
            formdata={'method':'post','id': 'username', 'passwd':'password'},
            formxpath='//form[@action="/login"]',
            callback=self.after_login)

    def after_login(self, response):
        self.logger.debug(response.text)
        if 'authentication failed' in response.text:
            self.logger.debug('Login failed')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果找不到文件,则引发 ValueError - Raise ValueError if file not found 引发ValueError(&#39;在系统中找不到外部ID:%s&#39;%xmlid) - raise ValueError('External ID not found in the system: %s' % xmlid) 引发 ValueError ValueError: Found array with 0 feature(s) (shape=(124, 0)) 而最少需要 1 - raise ValueError ValueError: Found array with 0 feature(s) (shape=(124, 0)) while a minimum of 1 is required 当我尝试创建 Folium 时出现以下错误 - Choropleth map: ValueError: key_on `'id'` not found in GeoJSON - Getting the following error when I try and create a Folium - Choropleth map: ValueError: key_on `'id'` not found in GeoJSON Python Scrapy ValueError(f"No<form> {response} 中找到的元素")</form> - Python Scrapy ValueError(f"No <form> element found in {response}") 使用 try 和 except 时返回错误代码? (无加薪) - Return Error code when using try and except? (without raise) 当我尝试为scikit-learn模型增加1个功能时,出现此错误“ ValueError:找到的输入变量样本数量不一致” - When I try to fit scikit-learn model with 1 more feature, I have this error “ValueError: Found input variables with inconsistent numbers of samples” 尝试/例外不会引发错误 - try/except does not raise error 使用函数的 output 创建列引发 ValueError - Create column with function's output raise a ValueError 引发ValueError(&#39;命名空间%s不可用&#39;%命名空间)ValueError:命名空间Gtk不可用 - raise ValueError('Namespace %s not available' % namespace) ValueError: Namespace Gtk not available
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM