[英]How can I full parsing HTML without third party library?
I am puzzled with this question. 我对这个问题感到困惑。
I can parse a HTML like below way. 我可以按以下方式解析HTML。
package org.owls.parser.html;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class HTMLParser {
public static String getHTTPStringsFromWeb(String urlStr) throws Exception {
StringBuffer sb = new StringBuffer();
URL url = new URL(urlStr);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader br = null;
if(con.getResponseCode() == HttpURLConnection.HTTP_OK)
{
br = new BufferedReader(new InputStreamReader(con.getInputStream()));
String line = "";
while((line = br.readLine()) != null){
sb.append(line);
}
br.close();
}
return sb.toString();
}
}
This code works well, but there is a problem. 该代码运行良好,但是存在问题。 This code can not get dynamic data which made of ajax result.
此代码无法获取由ajax结果组成的动态数据。
So I want to get full page. 所以我想得到整页。 Is it possible?
可能吗?
People talk about jsoup, but I want to know is there anyway to get this with native. 人们都在谈论jsoup,但是我想知道到底有没有使用本地语言实现的。
Thanks :D 感谢:D
There is an inherent problem in what you are trying to do, you need a web browser/environment to execute the ajax requests. 您尝试执行的操作存在一个固有的问题,您需要一个Web浏览器/环境来执行ajax请求。 reading them into a string and looking for url's is not enough, the functions may be doing something special with the data that you won't be able to support.
将它们读取为字符串并查找url是不够的,这些函数可能会对无法支持的数据做一些特殊的事情。
You will have to use something like phantomjs which can load and parse pages in a headless environment 您将不得不使用诸如phantomjs之类的东西,它们可以在无头环境中加载和解析页面
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.