简体   繁体   English

使用Java获取HTML内容的最快方法是什么?

[英]What is the fastest way to get a HTML Content using java?

I have this, but I was wondering if there is a faster way: 我有这个,但是我想知道是否有更快的方法:

        URL url=new URL(page);
        InputStream is = new BufferedInputStream(url.openConnection().getInputStream());
        BufferedReader in=new BufferedReader(new InputStreamReader(is));
        String tmp="";
        StringBuilder sb=new StringBuilder();
        while((tmp=in.readLine())!=null){
            sb.append(tmp);
        }

Probably network is the biggest overhead, there isn't much you can do on Java code side. 网络可能是最大的开销,在Java代码方面您无能为力。 But using IOUtils is at least much faster to implement: 但是使用IOUtils至少可以更快地实现:

String page = IOUtils.toString(url.openConnection().getInputStream());

Remember to close underlying stream. 记住关闭底层流。

if you need manipulating with your html, find some library. 如果您需要使用html进行操作,请找到一些库。 Like for example jsoup . 例如jsoup

jsoup is a Java library for working with real-world HTML. jsoup是一个用于处理实际HTML的Java库。 It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. 它提供了使用DOM,CSS和类似jquery的最好方法提取和处理数据的非常方便的API。

Example: 例:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

If you're using Apache Commons IO's IOUtils as Tomasz suggests, there's an even simpler method: toString(URL) , or its preferred cousins that take a charset (of course that requires knowing the resource's charset in advance). 如果您按照Tomasz的建议使用Apache Commons IO的IOUtils,则有一个甚至更简单的方法: toString(URL)或采用字符集的首选表亲(当然,需要事先知道资源的字符集)。

String string = IOUtils.toString( new URL( "http://some.url" ));

or 要么

String string = IOUtils.toString( new URL( "http://some.url" ), "US-ASCII" );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Java 获取唯一文件哈希的最快方法是什么? - What is the fastest way to get unique file hash using Java? 在Java中,获取系统时间最快的方法是什么? - In Java, what is the fastest way to get the system time? 获取多个xls文件内容的最快方法是什么? - What's the fastest way to get content of multiple xls files? 从Java文档中删除html标记的最快方法是什么? - What is the fastest way to remove html tags from a document in java? 在Java项目中获取所有构建错误的最快方法是什么? - What's the fastest way to get all build errors in a Java project? 概念:让 Java 程序运行的外生因素的最快方法是什么? - Conceptual: what is the fastest way to get exogenous factors for a Java program to run? 将 double[][] 转换为 Java 中的 MATLAB 矩阵的最快方法是什么? - What is the fastest way to get double[][] to a MATLAB matrix in Java? 在Java中获取k个最小(或最大)数组元素的最快方法是什么? - What is the fastest way to get k smallest (or largest) elements of array in Java? 什么是最快的方法来获取csv文件在Java中的尺寸 - what is the fastest way to get dimensions of a csv file in java Java中获取数量的最快因素的最快方法是什么 - What is the fastest way in Java to get the amount of factors a number has
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM