[英]Error: raise ValueError(“No element found in %s” % response) occur when try to login with scrappy
I want to crawl some info from the bbs of my college. 我想从我大学的bbs中抓取一些信息。 Here is the address: http://bbs.byr.cn Below is the code of my spider:
这是地址: http : //bbs.byr.cn下面是我的蜘蛛的代码:
from lxml import etree
import scrapy
try:
from scrapy.spiders import Spider
except:
from scrapy.spiders import BaseSpider as Spider
from scrapy.http import Request
class ITJobInfoSpider(scrapy.Spider):
name = "ITJobInfoSpider"
start_urls = ["http://bbs.byr.cn/#!login"]
def parse(self,response):
return scrapy.FormRequest.from_response(
response,
formdata={'method':'post','id': 'username', 'passwd':'password'},
formxpath='//form[@action="/login"]',
callback=self.after_login
)
def after_login(self,response):
print "######response body: " + response.body +"\n"
if "authentication failed" in response.body:
print "#######Login failed#########\n"
return
However, with this code, I often get an Error: raise ValueError("No element found in %s" % response) 但是,使用此代码,我经常会收到错误:引发ValueError(“在%s中找不到元素”%响应)
I find that this Error happens when scrapy try to parse the HTML code of the url: http://bbs.byr.cn , scrappy parses the page with lxml. 我发现当scrapy尝试解析url的HTML代码时会发生此错误: http ://bbs.byr.cn,scrappy用lxml解析页面。 Below is the code
下面是代码
root = LxmlDocument(response, lxml.html.HTMLParser)
forms = root.xpath('//form')
if not forms:
raise ValueError("No <form> element found in %s" % response)
So I look into the code with the code: print etree.tostring(root)
And find that HTML element: </form>
is parsed into </form>
所以我用代码查看代码:
print etree.tostring(root)
并发现HTML元素: </form>
被解析为</form>
no wonder the code forms = root.xpath('//form')
will return an empty forms list. 难怪代码
forms = root.xpath('//form')
将返回一个空表单列表。
But I don't know why this is happening, maybe the HTML code encoding?
但我不知道为什么会这样,也许是HTML代码编码? (The HTML code is encoded with GBK not UTF8.) Thanks advance for anyone who can help me out?
(HTML代码使用GBK编码,而不是UTF8。)感谢任何可以帮助我的人吗? BTW, if anyone want to write code against the website, I can give you an test account, pls leave me an email address in the comment.
顺便说一句,如果有人想要针对网站编写代码,我可以给你一个测试帐户,请在评论中留下我的电子邮件地址。
Thanks a lot, guys!! 非常感谢,伙计们!
There seems to be some JavaScript redirection happening. 似乎有一些JavaScript重定向发生。
In this case using Splash would be overkill, though. 在这种情况下,使用Splash会有点矫枉过正。 Simply append
/index
to the start URL: http://bbs.byr.cn → http://bbs.byr.cn/index
只需将
/index
附加到起始URL: http://bbs.byr.cn → http://bbs.byr.cn/index
: http://bbs.byr.cn → http://bbs.byr.cn/index
: http://bbs.byr.cn → http://bbs.byr.cn/index
This would be the complete working spider: 这将是完整的工作蜘蛛:
from scrapy import Spider
from scrapy.http import FormRequest
class ByrSpider(Spider):
name = 'byr'
start_urls = ['http://bbs.byr.cn/index']
def parse(self, response):
return FormRequest.from_response(
response,
formdata={'method':'post','id': 'username', 'passwd':'password'},
formxpath='//form[@action="/login"]',
callback=self.after_login)
def after_login(self, response):
self.logger.debug(response.text)
if 'authentication failed' in response.text:
self.logger.debug('Login failed')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.