简体   繁体   English

我如何使用和调试WWW :: Mechanize?

[英]How do I use and debug WWW::Mechanize?

I am very new to Perl and i am learning on the fly while i try to automate some projects for work. 我是Perl的新手,当我尝试自动化一些项目工作时,我正在学习。 So far its has been a lot of fun. 到目前为止它已经很有趣了。

I am working on generating a report for a customer. 我正在为客户生成报告。 I can get this report from a web page i can access. 我可以从我可以访问的网页上获取此报告。 First i will need to fill a form with my user name, password and choose a server from a drop down list, and log in. Second i need to click a link for the report section. 首先,我需要用我的用户名,密码填写表格,然后从下拉列表中选择一个服务器,然后登录。其次,我需要点击报告部分的链接。 Third a need to fill a form to create the report. 第三,需要填写表格来创建报告。

Here is what i wrote so far: 这是我到目前为止写的:

my $mech = WWW::Mechanize->new();
my $url = 'http://X.X.X.X/Console/login/login.aspx';

$mech->get( $url );

$mech->submit_form(
     form_number => 1,
     fields      =>{
        'ctl00$ctl00$cphVeriCentre$cphLogin$txtUser'  => 'someone',
        'ctl00$ctl00$cphVeriCentre$cphLogin$txtPW'    => '12345',
        'ctl00$ctl00$cphVeriCentre$cphLogin$ddlServers'  => 'Live',
     button => 'Sign-In'
   },   
);
die unless ($mech->success);

$mech->dump_forms();

I dont understand why, but, after this i look at the what dump outputs and i see the code for the first login page, while i belive i should have reached the next page after my successful login. 我不明白为什么,但是,在此之后我看看什么转储输出,我看到第一个登录页面的代码,而我相信我应该在我成功登录后到达下一页。

Could there be something with a cookie that can effect me and the login attempt? 可能有一些cookie可以影响我和登录尝试吗?

Anythings else i am doing wrong? 还有别的我做错了吗?

Appreciate you help, Yaniv Yaniv,感谢您的帮助

This is several months after the fact, but I resolved the same issue based on a similar questions I asked. 这是事后几个月,但我根据我提出的类似问题解决了同样的问题。 See Is it possible to automate postback from the client side? 请参阅是否可以从客户端自动回发? for more info. 了解更多信息。

I used Python's Mechanize instead or Perl, but the same principle applies. 我使用Python的Mechanize而不是Perl,但同样的原则适用。

Summarizing my earlier response: 总结我之前的回应:

ASP.NET pages need a hidden parameter called __EVENTTARGET in the form, which won't exist when you use mechanize normally. ASP.NET页面在表单中需要一个名为__EVENTTARGET的隐藏参数,当您正常使用mechanize时,该参数将不存在。

When visited by a normal user, there is a __doPostBack('foo') function on these pages that gives the relevant value to __EVENTTARGET via a javascript onclick event on each of the links, but since mechanize doesn't use javascript you'll need to set these values yourself. 当普通用户访问时,这些页面上有一个__doPostBack('foo')函数,通过每个链接上的javascript onclick事件为__EVENTTARGET提供相关值,但由于机械化不使用javascript,您需要自己设定这些值。

The python solution is below, but it shouldn't be too tough to adapt it to perl. python解决方案如下所示,但它不应该太难以适应perl。

def add_event_target(form, target):
    #Creates a new __EVENTTARGET control and adds the value specified
    #.NET doesn't generate this in mechanize for some reason -- suspect maybe is 
    #normally generated by javascript or some useragent thing?
    form.new_control('hidden','__EVENTTARGET',attrs = dict(name='__EVENTTARGET'))
    form.set_all_readonly(False)
    form["__EVENTTARGET"] = target

If you are on Windows, use Fiddler to see what data is being sent when you perform this process manually, and then use Fiddler to compare it to the data captured when performed by your script. 如果您使用的是Windows,请在手动执行此过程时使用Fiddler查看正在发送的数据,然后使用Fiddler将其与脚本执行时捕获的数据进行比较。

In my experience, a web debugging proxy like Fiddler is more useful than Firebug when inspecting form posts. 根据我的经验,在检查表单帖子时,像Fiddler这样的Web调试代理比Firebug更有用。

You can only mechanize stuff that you know. 你只能机械化你知道的东西。 Before you write any more code, I suggest you use a tool like Firebug and inspect what is happening in your browser when you do this manually. 在您编写更多代码之前,我建议您使用Firebug之类的工具,并在手动执行此操作时检查浏览器中发生的情况。

Of course there might be cookies that are used. 当然可能会使用cookie。 Or maybe your forgot a hidden form parameter? 或者你忘了一个隐藏的表格参数? Only you can tell. 只有你可以告诉。

EDIT: 编辑:

  • WWW::Mechanize should take care of cookies without any further intervention. WWW :: Mechanize应该在没有任何进一步干预的情况下处理cookie。
  • You should always check whether the methods you called were successful. 您应该始终检查您调用的方法是否成功。 Does the first get() work? 第一个get()是否有效?
  • It might be useful to take a look at the server logs to see what is actually requested and what HTTP status code is sent as a response. 查看服务器日志以查看实际请求的内容以及作为响应发送的HTTP状态代码可能很有用。

I have found it very helpful to use Wireshark utility when writing web automation with WWW::Mechanize . 我发现在使用WWW::Mechanize编写Web自动化时使用Wireshark实用程序非常有用。 It will help you in few ways: 它会以几种方式帮助您:

  1. Enable you realize whether your HTTP request was successful or not. 使您能够意识到您的HTTP请求是否成功。
  2. See the reason of failure on HTTP level. 查看HTTP级别失败的原因。
  3. Trace the exact data which you pass to the server and see what you receive back. 跟踪传递给服务器的确切数据,并查看收到的内容。

Just set an HTTP filter for the network traffic and start your Perl script. 只需为网络流量设置HTTP过滤器并启动Perl脚本。

The very short gist of aspx pages it that they hold all of the local session information within a couple of variables prefixed by "__" in the general aspxform. aspx的一个非常简短的要点就是它将所有本地会话信息保存在一般aspxform中以“__”为前缀的几个变量中。 Usually this is a top level form and all form elements will be part of it, but I guess that can vary by implementation. 通常这是一个顶级形式,所有表单元素都将成为其中的一部分,但我想这可能因实现而异。

For the particular implementation I was dealing with I needed to worry about 2 of these state variables, specifically: 对于我正在处理的特定实现,我需要担心其中的两个状态变量,具体来说:

__VIEWSTATE
__EVENTVALIDATION.

Your goal is to make sure that these variables are submitted into the form you are submitting, since they might be part of that main form aspxform that I mentioned above, and you are probably submitting a different form than that. 您的目标是确保将这些变量提交到您提交的表单中,因为它们可能是我上面提到的主表单aspxform的一部分,并且您可能提交的表单不同于此。

When a browser loads up an aspx page a piece of javascript passes this session information along within the asp server/client interaction, but of course we don't have that luxury with perl mechanize, so you will need to manually post these yourself by adding the elements to the current form using mechanize. 当浏览器加载一个aspx页面时,一段javascript会在asp服务器/客户端交互中传递此会话信息,但当然我们没有使用perl mechanize的那种奢侈,所以你需要通过添加手动发布这些使用mechanize的当前表单元素。

In the case that I just solved I basically did this: 在我刚刚解决的情况下,我基本上做了这个:

my $browser = WWW::Mechanize->new( );

# fetch the login page to get the initial session variables
my $login_page = 'http://www.example.com/login.aspx';
$response = $browser->get( $login_page);

# very short way to find the fields so you can add them to your post
$viewstate = ($browser->find_all_inputs( type => 'hidden', name => '__VIEWSTATE' ))[0]->value;
$validation = ($browser->find_all_inputs( type => 'hidden', name => '__EVENTVALIDATION' ))[0]->value;

# post back the formdata you need along with the session variables
$browser->post( $login_page, [ username => 'user', password => 'password, __VIEWSTATE => $viewstate, __EVENTVALIDATION => $validation ]);

# finally get back the content and make sure it looks right
print $response->content();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM