简体   繁体   English

java-从IP地址获取HTML

[英]java - get html from ip address

I have devices that publish an html page when you connect via their ip address. 当您通过其IP地址连接时,我的设备会发布html页面。 For example, if I were to go to "192.168.1.104" on my computer, i would see the html page the device publishes. 例如,如果要在计算机上转到“ 192.168.1.104”,我将看到设备发布的html页面。 I am trying to scrape this html, but I am getting some errors, specifically a MalformedURLException at the first line of my method. 我正在尝试抓取此html,但是我遇到了一些错误,特别是在方法的第一行出现了MalformedURLException。 I have posted my method below. 我已经在下面发布了我的方法。 I found some code for getting html and tweaked it for my needs. 我找到了一些获取html的代码,并根据需要对其进行了调整。 Thanks 谢谢

public String getSbuHtml(String ipToPoll) throws IOException, SocketTimeoutException {
    URL url = new URL("http", ipToPoll, -1, "/");
    URLConnection con = url.openConnection();
    con.setConnectTimeout(1000);
    con.setReadTimeout(1000);
    Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
    Matcher m = p.matcher(con.getContentType());
    String charset = m.matches() ? m.group(1) : "ISO-8859-1";
    BufferedReader r = new BufferedReader(
            new InputStreamReader(con.getInputStream(), charset));
    String line = null;
    StringBuilder buf = new StringBuilder();
    while ((line = r.readLine()) != null) {
        buf.append(line).append(System.getProperty("line.separator"));
    }
    return buf.toString();
}

EDIT: The above code has been changed to reflect constructing a new URL to work properly with an ip. 编辑:上面的代码已更改,以反映构造一个新的URL以与ip一起正常工作。 However, when I try and get the contentType from the connection, it is null. 但是,当我尝试从连接中获取contentType时,它为null。

A URL (Uniform Resource Locator) must have a resource to locate ( index.html ) along with the means of network communication ( http:// ). URL (统一资源定位器)必须具有资源index.html )以及网络通信方式http:// )。 So an example of valid URL can be 因此,有效网址的示例可以是

http://192.168.1.104:8080/app/index.html 

Merely 192.168.1.104 doesn't represent a URL 192.168.1.104不代表网址

您需要在传递给该方法的String的前面添加http://

Create your URL as follows: 创建您的URL,如下所示:

URL url = new URL("http", ipToPoll, -1, "/");

And since you're reading a potentially long HTML page I suppose buffering would help here: 而且由于您正在阅读可能很长的HTML页面,所以我认为缓冲在这里会有所帮助:

BufferedReader r = new BufferedReader(
                   new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) !- null) {
    buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();


EDIT : In response to your contentType coming null problem. 编辑 :响应您的contentType出现空问题。

Before you inspect any headers like with getContentType() or retrieve content with getInputStream() you need to actually establish a connection with the URL resource by calling 在检查任何类似于getContentType()标头或使用getInputStream()检索内容之前,您需要通过调用以下内容与URL资源建立实际连接:

URL url = new URL("http", ipToPoll, "/"); // -1 removed; assuming port = 80 always
// check your device html page address; change "/" to "/index.html" if required

URLConnection con = url.openConnection();

// set connection properties
con.setConnectTimeout(1000);
con.setReadTimeout(1000);

// establish connection
con.connect();

// get "content-type" header
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());

When you call openConnection() first (it wrongly suggests but) it doesn't establish any connection. 当您首先调用openConnection() (它提示错误,但是)它没有建立任何连接。 It just gives you an instance of URLConnection to let you specify connection properties like connection timeout with setConnecTimeout() . 它只是为您提供URLConnection的实例,让您使用setConnecTimeout()指定连接属性,例如连接超时。

If you're finding this hard to understand it may help to know that it's analogous to doing a new File() which simply represents a File but doesn't create one (assuming it doesn't exist already) unless you go ahead and call File.createNewFile() (or pass it to a FileReader ). 如果您发现这很难理解,可能会有助于您了解它类似于创建一个new File() ,该new File()仅表示一个File但是不会创建一个File (假设它不存在),除非您继续进行调用File.createNewFile() (或将其传递给FileReader )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM