简体   繁体   English

使用Jsoup从页面获取元素

[英]Using Jsoup to get an Element from a page

I want to log in to a https website using Jsoup and make subsequent calls 3-4 services to check whether a job is done or not. 我想使用Jsoup登录到https网站,并随后调用3-4服务来检查作业是否完成。

    public class JSOUPTester {
    public static void main(String[] args){
        System.out.println("Inside the JSOUP testing method");
        String url = "https://someloginpage.com";
    try{
        Document doc = Jsoup.connect(url).get();
        String S = doc.getElementById("username").text();// LINE 1
        String S1 = doc.getElementById("password").text();// LINE 2
    }catch(Exception e){
         e.printStackTrace();
     }
   }
}

Exception: 例外:

java.lang.NullPointerException
JSOUPTester.main(JSOUPTester.java:7)

I have checked in the chrome that these pages contain elements with id "username" and "password". 我已经在Chrome浏览器中检查了这些页面包含ID为“ username”和“ password”的元素。 The lines above are throwing NullPointerException. 上面的行抛出NullPointerException。 What I am doing wrong here? 我在这里做错了什么?

A Number of things can be the cause of this. 造成这种情况的原因可能有很多。 Without the URL I can't be certain, but here are some clues: 没有URL,我无法确定,但是这里有一些提示:

  • Some pages load their content via AJAX. 某些页面通过AJAX加载其内容。 Jsoup can#t deal with this, since it does not interpret any JavaScript. Jsoup无法处理此问题,因为它不解释任何JavaScript。 You can check for this by downloading the page with curl, or in a browser while turnig off JavaScript. 您可以通过使用curl下载页面或在关闭JavaScript的情况下在浏览器中进行检查。 To deal with pages that use JavaScript to render themselves, you can use tools like Selenium webdriver or HTMLUnit. 要处理使用JavaScript进行呈现的页面,可以使用Selenium Webdriver或HTMLUnit之类的工具。

  • The webserver of the page that you try to load might require a cookie to be present. 您尝试加载的页面的Web服务器可能需要一个cookie。 You need to look at the network traffic that happens surfing loading of that page. 您需要查看在加载该页面时发生的网络流量。 In chrome or firefox you can see this in the developer tools in the network tab. 在chrome或firefox中,您可以在网络标签的开发人员工具中看到此内容。

  • The webserver might respond differently for different clients. Web服务器对于不同的客户端可能会有不同的响应。 That is why you may have to set the UserAgent string to a known Browser in your JSoup http request. 这就是为什么您可能必须在JSoup http请求中将UserAgent字符串设置为已知的浏览器的原因。

    Jsoup.connect("url").userAgent("Mozilla/5.0")

  • JSoup has a size limitation of 1MB for the downloaded html source. 对于下载的html源,JSoup的大小限制为1MB。 You can turn this off or set it to a larger value if needed. 您可以关闭此功能,也可以根据需要将其设置为更大的值。

    Jsoup.connect("url").maxBodySize(0)

  • Jsoup might timeout on the request. Jsoup可能会在请求上超时。 To change timeout behavior use 要更改超时行为,请使用

    Jsoup.connect("url").timeout(milliseconds)

  • There might be other reasons I did not think of now. 我可能现在还没有想到其他原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM