简体   繁体   English

Jsoup Java解析器:无法从网站获取所有内容HTML

[英]Jsoup Java parser: cannot get all content HTML from website

I try to get all tags HTML in website https://launch.stellar.org/#/login . 我尝试在https://launch.stellar.org/#/login网站上获取所有HTML标签。

But my result do not have any input tags like when i see this website in F12 tool in firefox. 但是我的结果没有任何输入标签,就像我在Firefox中的F12工具中看到此网站时一样。

I do not understand why and what's solution for this problem? 我不明白为什么以及该问题的解决方案是什么?

Here my code: 这是我的代码:

import java.io.BufferedReader; import java.io.DataOutputStream; import java.io.InputStreamReader; import java.io.UnsupportedEncodingException; import java.net.CookieHandler; import java.net.CookieManager; import java.net.URL; import java.net.URLEncoder; import java.util.ArrayList; import java.util.List; import javax.net.ssl.HttpsURLConnection; import org.jsoup.Connection; import org.jsoup.Jsoup; import org.jsoup.helper.HttpConnection.Response; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class HttpUrlConnect { private HttpsURLConnection conn; private final String USER_AGENT = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"; public static void main(String[] args) throws Exception { String url = "https://launch.stellar.org/#/login"; HttpUrlConnect http = new HttpUrlConnect(); // 1. Send a "GET" request, so that you can extract the form's data. String page = http.GetPageContent(url); Document doc = Jsoup.parse(page); System.out.println(doc); } String GetPageContent(String url) throws Exception { URL obj = new URL(url); conn = (HttpsURLConnection) obj.openConnection(); // default is GET conn.setRequestMethod("GET"); conn.setUseCaches(false); // act like a browser conn.setRequestProperty("Host", "wallet.stellar.org"); conn.setRequestProperty("User-Agent", USER_AGENT); conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); conn.setRequestProperty("Accept-Language", "vi-VN,vi;q=0.8,fr-FR;q=0.6,fr;q=0.4,en- US;q=0.2,en;q=0.2"); int responseCode = conn.getResponseCode(); System.out.println("\nSending 'GET' request to URL : " + url); System.out.println("Response Code : " + responseCode); BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream())); String inputLine; StringBuffer response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } in.close(); return response.toString(); }

I download jsoup library here: http://jsoup.org/download 我在这里下载jsoup库: http : //jsoup.org/download

But my result do not have any input tags like when i see this website in F12 tool in firefox 但是我的结果没有任何输入标签,例如当我在firefox中的F12工具中看到此网站时

The "F12 Tool" (Inspector/Firebug) lets you see the source with all the modifications that javascript does to the page when your client (Firefox) opens it. 通过“ F12工具”(Inspector / Firebug),可以在客户端(Firefox)打开页面时查看javascript对页面进行的所有修改。

In fact, if you try to see the source received from the server ( CTRL U ) you will see there is no input element in the page. 实际上,如果您尝试查看从服务器收到的源代码( CTRL U ),您将看到页面中没有input元素。

The code you see is all generated by javascript, so you'll need a tool to interpret javascript code and give you the resulting HTML code. 您看到的代码都是由javascript生成的,因此您需要一个工具来解释javascript代码并为您提供生成的HTML代码。


In fact, JSoup is just an HTML parser. 实际上,JSoup只是一个HTML解析器。 To achieve that, you'll need to switch to Selenium or HTMLUnit 为此,您需要切换到SeleniumHTMLUnit

Text coming from server side is same as the output you got from jsoup 来自服务器端的文本与您从jsoup获得的输出相同

After page is loaded inside web browser input tags are dynamically created using javascript so only you are unable to see the input tags. 页面加载到Web浏览器中之后,将使用javascript动态创建输入标签,因此只有您看不到输入标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM