简体   繁体   English

如何让WWW-Mechanize登录Wells Fargo的网站?

[英]How can I get WWW-Mechanize to login to Wells Fargo's website?

I am trying to use Perl's WWW::Mechanize to login to my bank and pull transaction information. 我正在尝试使用Perl的WWW :: Mechanize登录我的银行并提取交易信息。 After logging in through a browser to my bank (Wells Fargo), it briefly displays a temporary web page saying something along the lines of "please wait while we verify your identity". 通过浏览器登录我的银行(富国银行)后,它会短暂显示一个临时网页,上面写着“请等待我们验证您的身份”。 After a few seconds it proceeds to the bank's webpage where I can get my bank data. 几秒钟后,它会进入银行的网页,我可以在那里获取我的银行数据。 The only difference is that the URL contains several more "GET" parameters appended to the URL of the temporary page, which only had a sessionID parameter. 唯一的区别是URL包含多个附加到临时页面的URL的“GET”参数,该参数只有一个sessionID参数。

I was able to successfully get WWW::Mechanize to login from the login page, but it gets stuck on the temporary page. 我能够成功地让WWW :: Mechanize从登录页面登录,但它被卡在临时页面上。 There is a <meta http-equiv="Refresh" ... tag in the header, so I tried $mech->follow_meta_redirect but it didn't get me past that temporary page either. 标题中有一个<meta http-equiv="Refresh" ...标记,所以我尝试了$mech->follow_meta_redirect但它也没有让我超过那个临时页面。

Any help to get past this would be appreciated. 任何帮助过去这将是值得赞赏的。 Thanks in advance. 提前致谢。

Here is the barebones code that gets me stuck at the temporary page: 以下是使我陷入临时页面的准系统代码:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
$mech->agent_alias( 'Linux Mozilla' );

$mech->get( "https://www.wellsfargo.com" );
$mech->submit_form(
    form_number => 2,
    fields => {
        userid => "$userid",
        password => "$password"
    },
    button => "btnSignon"
);

Sorry, it has been years since I've coded Perl. 对不起,我编写Perl已经有好几年了。 However, since there's no "copy and paste" answer posted for this question yet, here's how to scrape Wells Fargo in Ruby: 但是,由于此问题尚未发布“复制和粘贴”答案,以下是如何在Ruby中抓取Wells Fargo:

require 'rubygems'
require 'mechanize'

username = 'your_username'
password = 'your_password'

agent = Mechanize.new
agent.user_agent_alias = 'Windows IE 6'

# get first page
page = agent.get('https://online.wellsfargo.com/signon/')

# find and fill form
form = page.form_with(:name => 'Signon')      
form['userid'] = username
form['password'] = password
page = agent.submit form

# find the refresh url
page.body.match /content="1;URL=(.*?)"/
nexturl = $1

# wait a little while and then get the next page
sleep 3
page = agent.get nexturl

# If you have multiple accounts, you can use this. If you just have a single account, you can remove this block
companies = [['Account1', '123456789'], 
             ['Account2', '123456789']]

companies.each do |name, id|
  form = page.form_with(:name => 'ChangeViewFormBean')
  form['viewKey'] = id
  page = agent.submit form

  available_balance = page.search("#cashTotalAvailBalance").text.strip

  puts "#{name}: #{available_balance}"
  sleep 2
end

Works Cited: There's a guy who wrote a version of this script, posted it to his code directory and then forwarded the whole thing to his blog. 作品引用:有一个人编写了这个脚本的版本,将其发布到他的代码目录中,然后将整个内容转发给他的博客。 His last name is Youngblood or similar. 他的姓氏是Youngblood或类似的。 I found the source in the internet archive/way back machine and modified it to make what you see above. 我在互联网存档/回程机器中找到了源代码并对其进行了修改以实现您在上面看到的内容。 So, thanks Mr. Youngblood or similar, where ever you are - and thanks for teaching me the meta scrape trick! 所以,感谢Youngblood先生或类似的人,无论你在哪里 - 并感谢教我的元刮技巧!

You'll need to reverse-engineer what's happening on that intermediary page. 您需要对该中间页面上发生的事情进行逆向工程。 Does it use Javascript to set some cookies, for example? 例如,它是否使用Javascript来设置一些cookie? Mech won't parse or execute Javascript on a page, so it may be trying to follow the meta-refresh but missing some crucial information about what needs to happen for the final request. Mech不会在页面上解析或执行Javascript,因此它可能会尝试遵循元刷新但却遗漏了一些关于最终请求需要发生什么的重要信息。

Try using a tool like Firebug to watch the request that's sent when the browser follows the meta-refresh. 尝试使用像Firebug这样的工具来观察浏览器遵循元刷新时发送的请求。 Examine all the request headers, including cookies, that are sent to request the final page. 检查为请求最终页面而发送的所有请求标头,包括cookie。 Then use Mech to duplicate that. 然后使用Mech复制它。

如果您知道下一页的位置,可以在使用附加的get参数后尝试获取它

$mech->add_header($name => $value);

First you need to know is this Javascript or not: i recommend to use Web Developer (but you may use NoScript too) to disable Javascript and try to login via browser (but first you need to clear all cookies related to your target site! ). 首先你需要知道的是这个Javascript与否:我建议使用Web Developer (但你也可以使用NoScript )来禁用Javascript并尝试通过浏览器登录(但首先你要清除所有与你的目标网站相关的cookie! ) 。

If you still (with Javascript disabled) can login than this is not Javascript issue and you need to investigate HTTP headers (it may be x,y coordinates of the clicked button for example or some cookies recieved only when you load CSS file etc). 如果您仍然(禁用Javascript)可以登录这不是Javascript问题 ,您需要调查HTTP标头(例如,它可能是单击按钮的x,y坐标或仅在加载CSS文件时收到的某些cookie等)。

I recommend to use HttpFox for checking HTTP headers. 我建议使用HttpFox来检查HTTP标头。 You need to run HttpFox logging and after that perform login again (by the way disabling images before doing this will significantly reduce your log). 您需要运行HttpFox日志记录,然后再次执行登录(顺便说一下,在执行此操作之前禁用图像会显着减少日志)。 After that you need to check every request and corresponding response to find where hidden cookies are setted or some hidden form param created. 之后,您需要检查每个请求和相应的响应,以找到设置隐藏cookie的位置或创建一些隐藏的表单参数。

If you can not login after disabling Javascript than you need to look at the headers too. 如果您在禁用Javascript后无法登录,那么您也需要查看标题。 You need to compare cookies provided in HTTP header response with cookies you have in the later request. 您需要将HTTP标头响应中提供的cookie与您在以后的请求中使用的Cookie进行比较。 After you find html with "malicious" Javascript you can analize this Javascript to find algorithm how this cookie (or form param) created. 在您找到带有“恶意”Javascript的HTML后,您可以分析此Javascript以查找此cookie(或表单参数)如何创建的算法。

And you last step will be to repeat this cookie/form param in you WWW::Mechanize request . 你最后一步将在WWW :: Mechanize请求中重复这个cookie / form param

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM