简体   繁体   English

如何使用HTMLUnit登录vbulletin论坛?

[英]How to login to vbulletin forum using HTMLUnit?

I'm a total newbie at HTMLUnit trying to scrape a vbulletin web forum. 我是HTMLUnit的新手,正试图抓取vbulletin网络论坛。 I'm having trouble getting it to enter the user/pass and actually login. 我无法让它输入用户名/通行证并实际登录。

Here's my code so far: 到目前为止,这是我的代码:

package scraper;

import java.io.IOException;
import java.net.UnknownHostException;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class Scraper {

    public static void main(String[] args) {
        try {
            Scraper ocau = new Scraper("http://forums.overclockers.com.au/forumdisplay.php?f=15&order=desc");
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
    }

    public Scraper(String url) throws UnknownHostException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
        webClient.getOptions().setJavaScriptEnabled(false);
        webClient.getOptions().setCssEnabled(false);

        HtmlPage page;
        try {
            page = webClient.getPage(url);


            HtmlForm login = page.getForms().get(0);
            System.out.println(login);

        } catch (FailingHttpStatusCodeException | IOException e) {
            e.printStackTrace();
        }

        webClient.closeAllWindows();
    }
}

The output of this is just the login form (I think): 此输出仅是登录表单(我认为):

HtmlForm[<form action="login.php?do=login" method="post" onsubmit="md5hash(vb_login_password, vb_login_md5password, vb_login_md5password_utf, 0)">]

The script/form on the page: 页面上的脚本/表单:

<script type="text/javascript" src="clientscript/vbulletin_md5.js?v=384"></script>
<form action="login.php?do=login" method="post" onsubmit="md5hash(vb_login_password, vb_login_md5password, vb_login_md5password_utf, 0)">
<input type="hidden" name="do" value="login" />
<input type="hidden" name="url" value="/forumdisplay.php?f=15&amp;order=desc" />
<input type="hidden" name="vb_login_md5password" />
<input type="hidden" name="vb_login_md5password_utf" />
<input type="hidden" name="s" value="" />
<input type="hidden" name="securitytoken" value="guest" />

I'm not too sure where to go from here to actually enter the username/password and click submit. 我不太确定从这里实际输入用户名/密码并单击提交的位置。 I read this answer that said that I need to set vb_login_md5password and vb_login_md5password_utf , which are hidden inputs on the page, but I have no idea how to reference or set these. 我读了这个回答 ,说我需要设置vb_login_md5passwordvb_login_md5password_utf ,它们是页面上的隐藏输入,但是我不知道如何引用或设置它们。 There is a javascript md5 script referenced in the html at src="clientscript/vbulletin_md5.js?v=384" . html中的src="clientscript/vbulletin_md5.js?v=384"引用了一个javascript md5脚本。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Edit: Thanks to arya, it is now working, I had to use this code to log in and print the page: 编辑:感谢arya,它现在可以工作了,我不得不使用以下代码登录并打印页面:

    ((HtmlElement) page.getFirstByXPath("//fieldset/table/tbody/tr/td/input")).type("secretusername");
    ((HtmlElement) page.getFirstByXPath("//fieldset/table/tbody/tr[2]/td/input")).type("secretpassword");
    HtmlPage loggedin = ((HtmlElement) page.getFirstByXPath("//tr[4]/td/input")).click();           
    System.out.println(loggedin.asXml());

Try inputting the values with xpath and see if that works. 尝试使用xpath输入值,看看是否可行。

page.getFirstByXPath("//fieldset/table/tbody/tr/td/input").type("yourid") //this needs to be casted to htmlelement I think, eclipse should take of that

page.getFirstByXPath("//fieldset/table/tbody/tr[2]/td/input").type("yourpass") //also needs to be casted

page.getFirstByXPath("//tr[4]/td/input").click(); //also needs to be casted!

If the solution above does not work, you would have to capture the traffic with something like Fiddler and emulate it with HTMLUnit, let me know if it does not work so I can edit my answer. 如果上述解决方案不起作用,则必须使用Fiddler之类的工具捕获流量,并使用HTMLUnit进行仿真,让我知道它是否不起作用,以便我编辑答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM