错误：尝试使用scrappy登录时发生ValueError（“在％s中找不到元素”％响应）

Question

Problem Description: 问题描述：

I want to crawl some info from the bbs of my college. 我想从我大学的bbs中抓取一些信息。 Here is the address: http://bbs.byr.cn Below is the code of my spider: 这是地址： http ： //bbs.byr.cn下面是我的蜘蛛的代码：

from lxml import etree
import scrapy
try:
from scrapy.spiders import Spider
except:
from scrapy.spiders import BaseSpider as Spider
from scrapy.http import Request

class ITJobInfoSpider(scrapy.Spider):
name = "ITJobInfoSpider"
start_urls = ["http://bbs.byr.cn/#!login"]

def parse(self,response):
    return scrapy.FormRequest.from_response(
        response,
        formdata={'method':'post','id': 'username', 'passwd':'password'},
        formxpath='//form[@action="/login"]',
        callback=self.after_login
)

def after_login(self,response):
    print "######response body: " + response.body +"\n"
    if "authentication failed" in response.body:
        print "#######Login failed#########\n"
    return

However, with this code, I often get an Error: raise ValueError("No element found in %s" % response) 但是，使用此代码，我经常会收到错误：引发ValueError（“在％s中找不到元素”％响应）

My Investigation: 我的调查：

I find that this Error happens when scrapy try to parse the HTML code of the url: http://bbs.byr.cn , scrappy parses the page with lxml. 我发现当scrapy尝试解析url的HTML代码时会发生此错误： http ：//bbs.byr.cn，scrappy用lxml解析页面。 Below is the code 下面是代码

root = LxmlDocument(response, lxml.html.HTMLParser)
forms = root.xpath('//form')
if not forms:
    raise ValueError("No <form> element found in %s" % response)

So I look into the code with the code: print etree.tostring(root) And find that HTML element: </form> is parsed into </form> 所以我用代码查看代码： print etree.tostring(root)并发现HTML元素： </form>被解析为</form> no wonder the code forms = root.xpath('//form') will return an empty forms list. 难怪代码forms = root.xpath('//form')将返回一个空表单列表。

But I don't know why this is happening, maybe the HTML code encoding? 但我不知道为什么会这样，也许是HTML代码编码？ (The HTML code is encoded with GBK not UTF8.) Thanks advance for anyone who can help me out? （HTML代码使用GBK编码，而不是UTF8。）感谢任何可以帮助我的人吗？ BTW, if anyone want to write code against the website, I can give you an test account, pls leave me an email address in the comment. 顺便说一句，如果有人想要针对网站编写代码，我可以给你一个测试帐户，请在评论中留下我的电子邮件地址。

Thanks a lot, guys!! 非常感谢，伙计们！

Answer 1

There seems to be some JavaScript redirection happening. 似乎有一些JavaScript重定向发生。

In this case using Splash would be overkill, though. 在这种情况下，使用Splash会有点矫枉过正。 Simply append /index to the start URL: http://bbs.byr.cn → http://bbs.byr.cn/index 只需将/index附加到起始URL： http://bbs.byr.cn → http://bbs.byr.cn/index ： http://bbs.byr.cn → http://bbs.byr.cn/index ： http://bbs.byr.cn → http://bbs.byr.cn/index

This would be the complete working spider: 这将是完整的工作蜘蛛：

from scrapy import Spider
from scrapy.http import FormRequest

class ByrSpider(Spider):
    name = 'byr'
    start_urls = ['http://bbs.byr.cn/index']

    def parse(self, response):
        return FormRequest.from_response(
            response,
            formdata={'method':'post','id': 'username', 'passwd':'password'},
            formxpath='//form[@action="/login"]',
            callback=self.after_login)

    def after_login(self, response):
        self.logger.debug(response.text)
        if 'authentication failed' in response.text:
            self.logger.debug('Login failed')

错误：尝试使用scrappy登录时发生ValueError（“在％s中找不到元素”％响应）

问题描述

Problem Description: 问题描述：

My Investigation: 我的调查：

1 个解决方案

解决方案1
0 2019-01-30 12:29:10

错误：尝试使用scrappy登录时发生ValueError（“在％s中找不到元素”％响应）

问题描述

Problem Description: 问题描述：

My Investigation: 我的调查：

1 个解决方案

解决方案1 0 2019-01-30 12:29:10

解决方案1
0 2019-01-30 12:29:10