简体   繁体   English

在抓取中,无法使用Mechanize登录

[英]In scraping, can't login with Mechanize

My aim: On ROR 3, get a PDF file from a site which requires you to login before you can download it 我的目标:在ROR 3上,从需要下载的网站上获取PDF文件,然后才能下载它

My method: 我的方法:

  • Step 1: log in to the site with Mechanize 步骤1:使用Mechanize登录到站点
  • Step 2: since I'm logged in, get the PDF with Nokogiri 步骤2:由于我已登录,请使用Nokogiri获取PDF

Apparently, the login didn't succeed because I get nothing when I debug (pretty sure that the nokogiri part works well, already tested) 显然,登录未成功,因为在调试时我一无所获(非常确定nokogiri部分工作正常,已经过测试)

Below my code: 在我的代码下面:

My Controller.rb 我的Controller.rb

begin

# login to the scraped site:
agent = Mechanize.new
agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")

#look for the wanted form
form = puts agent.page.parser.css('form')[1]

#login
agent.page.forms[1]["login"] = "my_login"
agent.page.forms[1]["password"] = "my_password"
agent.page.forms[1].submit


#scrape with nokogiri
docwatan = Nokogiri::HTML(open('http://www.elwatan.com/'))
@watan = {}
docwatan.xpath('//th/a').each do |link|
@watan[link.text.strip] = link['href']
end

My View.rb 我的View.rb

<ul id= "list">  
<% if @watan %>
<% @watan.each do |key, value| %>
<li class="List" ><a href="http://www.elwatan.com/<%= "#{value}" %>" target='_blank'>      <%= "#{key}" %></a></li><% end %>
<% end %>

and the login form, from the scraped site 和登录表单(来自抓取的网站)

<form method="post" action="/sso/login.php" id="form-login-page">
<div id="form-login-container-page" style="color:red;text-    align:center;width:100%;margin:10px 0"></div> 
<input type="hidden" name="minimalist" value="1"><input type="hidden"    name="SSO_Context" value=""><div class="clear"> </div>
<label>Email<span>*</span></label>
<div class="insc-saisie">
<input class="insc-saisie-champ" type="text" id="login-page" name="login" value="">
</div>
<div class="clear"> </div>

<label>Mot de passe<span>*</span></label>
<div class="insc-saisie">
<input class="insc-saisie-champ" type="password" id="password-page" name="password"    value="">
</div>
<div class="clear"> </div>

<label><input type="checkbox" unchecked=""></label>
<div class="insc-saisie">Se souvenir</div>
<div class="clear"> </div>

<label> </label>
<div class="insc-saisie">
<a href="javascript:showLostPassForm();">Mot de passe oublié ?</a>
</div>
<div class="clear"> </div>

<label> </label>
<div class="insc-saisie">
<input class="b-connexion" type="image" src="/img/trans.gif">
</div>
<div class="clear"> </div>
<div class="clear"> </div>
<label><span>*</span></label>
<div class="insc-saisie">Saisie obligatoire</div>
<div class="clear"> </div>
</form>

kinhdly notice that the login is done on this page " http://elwatan.com/sso/inscription/inscription_payant.php ", and the download from " http://elwatan.com "; 请注意,登录是在此页面“ http://elwatan.com/sso/inscription/inscription_payant.php ”和从“ http://elwatan.com ”下载的; could be important 可能很重要

Thanks in advance 提前致谢

Instead of: 代替:

docwatan = Nokogiri::HTML(open('http://www.elwatan.com/'))

You want to do: 您想做:

docwatan = agent.get('http://www.elwatan.com/')

otherwise the session cookie isn't getting sent in the request. 否则,会话Cookie不会在请求中发送。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM