简体   繁体   English

Java登录到使用复杂JavaScript的网站

[英]Java Logging in to a website that uses complex javascript

I'd first like to start by saying, I've managed this using phantomJS and Selenium. 我首先要说的是,我已经使用phantomJS和Selenium对此进行了管理。 I load phantomjs, load the url (sports.coral.co.uk) and then check my balance. 我加载phantomjs,加载url(sports.coral.co.uk),然后检查余额。 I am however trying to find a more lightweight option. 但是,我正在尝试找到一种更轻便的选择。

I have tried manually sending http get/post requests using apache's HttpClient. 我尝试过使用apache的HttpClient手动发送http get / post请求。 Monitoring the login process, using postman for chrome, shows 4 requests sent once the login button has been pressed. 使用Postman for chrome监视登录过程,一旦按下登录按钮,将显示4个请求。 I have tried editing and re-sending them using postman. 我尝试使用邮递员编辑和重新发送它们。 However, from what I can tell there's a requestID that gets sent along with the requests. 但是,据我所知,有一个requestID与请求一起发送。 This is generated using the javascript on the page. 这是使用页面上的javascript生成的。

var requestId = (new Date().getTime()) + Math.round(Math.random() * 1000000);
            var failedTimer = setTimeout('iapiRequestFailed(' + requestId + ')', iapiConf['loginDomainRetryInterval'] * 1000);
            iapiRegisterRequestId(requestId, iapiCALLOUT_MESSAGES, failedTimer, request[3], request[4], request[5]);
            return;

It looks like the original ID is a random generated number, that then gets registered using another javascript function. 原始ID似乎是一个随机生成的数字,然后使用另一个javascript函数进行了注册。 I'm guessing the login is partly failing due to me not being able to provide an acceptable requestID. 我猜测由于我无法提供可接受的requestID,导致登录部分失败。 When I re-send the old requests the user is part logged in. Once i click on my account it says an error occurred. 当我重新发送旧请求时,用户已部分登录。单击我的帐户后,表明发生了错误。 The only explanation would be the requestID. 唯一的解释是requestID。

I then decided to give HtmlUnit a go. 然后,我决定尝试一下HtmlUnit。 This seems like the type of thing I require. 这似乎是我需要的东西。 I did some research on using HttpClient with a javascript engine, such as Rhino and it seems HtmlUnit is the tool for that. 我对将HttpClient与JavaScript引擎(如Rhino)一起使用进行了一些研究,似乎HtmlUnit是实现此目的的工具。 Before I even try to log in to the page, I get errors caused by the javascript on the page. 在尝试登录页面之前,我会收到由页面上的javascript引起的错误。

Heres the simple bit of code I use to connect to the page; 这是我用来连接页面的简单代码;

@Test
public void htmlunit() throws Exception {
    LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
    java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

    WebClient client = new WebClient(BrowserVersion.CHROME);
    client.getOptions().setJavaScriptEnabled(true);
    client.getOptions().setThrowExceptionOnScriptError(false);
    client.getOptions().setThrowExceptionOnFailingStatusCode(false);


    HtmlPage page = client.getPage("http://sports.coral.co.uk");
    System.out.println(page.asText());
    client.close();
}

When I comment out the LogFactory bit I can see that there are loads of Warnings thrown, 当我注释掉LogFactory位时,我可以看到抛出了很多警告,

WARNING: Obsolete content type encountered: 'application/x-javascript'.
Feb 09, 2016 4:33:34 PM com.gargoylesoftware.htmlunit.html.HtmlScript isExecutionNeeded

WARNING: Script is not JavaScript (type: application/ld+json, language: ). 警告:脚本不是JavaScript(类型:application / ld + json,语言:)。 Skipping execution. 跳过执行。 etc... 等等...

I'm guessing this means that HtmlUnit isn't compatible with the javascript thats being executed on the page? 我猜这意味着HtmlUnit与页面上正在执行的javascript不兼容吗? I'm not very good with javascript and the scripts on the page are obfuscated, which makes it even harder to read. 我对javascript不太满意,并且页面上的脚本被混淆了,这使得阅读起来更加困难。 What I don't understand is, why does the JS get executed without error when using phantomJS or chromeDriver but not HtmlUnit? 我不明白的是,为什么在使用phantomJS或chromeDriver而不是HtmlUnit时,为什么JS在执行时没有错误? Is it because the Rhino engine isn't good enough to execute it? 是因为Rhino引擎执行起来不够好吗? Am I missing something obvious? 我是否缺少明显的东西?

This code will turn off all the javascript warnings caused by the htmlunit library and not by your code. 此代码将关闭由htmlunit库而不是您的代码引起的所有javascript警告。

LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

HtmlPage page = webClient.getPage("http://sports.coral.co.uk");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM