简体   繁体   English

没有第三方库,如何完整解析HTML?

[英]How can I full parsing HTML without third party library?

I am puzzled with this question. 我对这个问题感到困惑。

I can parse a HTML like below way. 我可以按以下方式解析HTML。

package org.owls.parser.html;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class HTMLParser {
    public static String getHTTPStringsFromWeb(String urlStr) throws Exception {
        StringBuffer sb = new StringBuffer();
        URL url = new URL(urlStr);
        HttpURLConnection con = (HttpURLConnection) url.openConnection();

        BufferedReader br = null;
        if(con.getResponseCode() == HttpURLConnection.HTTP_OK)
        {
            br = new BufferedReader(new InputStreamReader(con.getInputStream()));
            String line = "";
            while((line = br.readLine()) != null){
                sb.append(line);
            }
            br.close();
        }
        return sb.toString();
    }
}

This code works well, but there is a problem. 该代码运行良好,但是存在问题。 This code can not get dynamic data which made of ajax result. 此代码无法获取由ajax结果组成的动态数据。

So I want to get full page. 所以我想得到整页。 Is it possible? 可能吗?

People talk about jsoup, but I want to know is there anyway to get this with native. 人们都在谈论jsoup,但是我想知道到底有没有使用本地语言实现的。

Thanks :D 感谢:D

There is an inherent problem in what you are trying to do, you need a web browser/environment to execute the ajax requests. 您尝试执行的操作存在一个固有的问题,您需要一个Web浏览器/环境来执行ajax请求。 reading them into a string and looking for url's is not enough, the functions may be doing something special with the data that you won't be able to support. 将它们读取为字符串并查找url是不够的,这些函数可能会对无法支持的数据做一些特殊的事情。

You will have to use something like phantomjs which can load and parse pages in a headless environment 您将不得不使用诸如phantomjs之类的东西,它们可以在无头环境中加载和解析页面

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我可以在 java 中使用第三方 scala 库吗? - Can I use a third party scala library in java? 如何将第三方Java库编译为我可以使用的jar文件? - How to compile a third party Java library into a jar file I can use? 如何在没有 jsoup 或任何其他第三方的情况下在 java 上读取 html? - How to read html on java without jsoup or any other third party? 如何将绒毛从第三方库中剥离出来? - How do I strip the fluff out of a third party library? 如何查找第三方库中正在使用的第三方组件 - How to find third party component being used in a third party library 用于IntelliJ IDEA的Grepcode插件-可以将调试器附加到第三方库,而无需将源加载到IDEA吗? - Grepcode plugin for IntelliJ IDEA- can debugger be attached to third party library without source loaded into IDEA? 如果没有第三方库,您可以在Java 7中进行基本的EC操作吗? - Can you do basic EC operations in Java 7 without a third-party library? 如何在第三方库中的 Eclipse 中设置断点? - How to set a breakpoint in Eclipse in a third party library? 如何监控第三方库的方法执行? - How to monitor method execution at third party library? 如何在Eclipse中将第三方库集成到Android - How to integrate a third party library to Android in Eclipse
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM