简体   繁体   中英

java - get html from ip address

I have devices that publish an html page when you connect via their ip address. For example, if I were to go to "192.168.1.104" on my computer, i would see the html page the device publishes. I am trying to scrape this html, but I am getting some errors, specifically a MalformedURLException at the first line of my method. I have posted my method below. I found some code for getting html and tweaked it for my needs. Thanks

public String getSbuHtml(String ipToPoll) throws IOException, SocketTimeoutException {
    URL url = new URL("http", ipToPoll, -1, "/");
    URLConnection con = url.openConnection();
    con.setConnectTimeout(1000);
    con.setReadTimeout(1000);
    Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
    Matcher m = p.matcher(con.getContentType());
    String charset = m.matches() ? m.group(1) : "ISO-8859-1";
    BufferedReader r = new BufferedReader(
            new InputStreamReader(con.getInputStream(), charset));
    String line = null;
    StringBuilder buf = new StringBuilder();
    while ((line = r.readLine()) != null) {
        buf.append(line).append(System.getProperty("line.separator"));
    }
    return buf.toString();
}

EDIT: The above code has been changed to reflect constructing a new URL to work properly with an ip. However, when I try and get the contentType from the connection, it is null.

A URL (Uniform Resource Locator) must have a resource to locate ( index.html ) along with the means of network communication ( http:// ). So an example of valid URL can be

http://192.168.1.104:8080/app/index.html 

Merely 192.168.1.104 doesn't represent a URL

您需要在传递给该方法的String的前面添加http://

Create your URL as follows:

URL url = new URL("http", ipToPoll, -1, "/");

And since you're reading a potentially long HTML page I suppose buffering would help here:

BufferedReader r = new BufferedReader(
                   new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) !- null) {
    buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();


EDIT : In response to your contentType coming null problem.

Before you inspect any headers like with getContentType() or retrieve content with getInputStream() you need to actually establish a connection with the URL resource by calling

URL url = new URL("http", ipToPoll, "/"); // -1 removed; assuming port = 80 always
// check your device html page address; change "/" to "/index.html" if required

URLConnection con = url.openConnection();

// set connection properties
con.setConnectTimeout(1000);
con.setReadTimeout(1000);

// establish connection
con.connect();

// get "content-type" header
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());

When you call openConnection() first (it wrongly suggests but) it doesn't establish any connection. It just gives you an instance of URLConnection to let you specify connection properties like connection timeout with setConnecTimeout() .

If you're finding this hard to understand it may help to know that it's analogous to doing a new File() which simply represents a File but doesn't create one (assuming it doesn't exist already) unless you go ahead and call File.createNewFile() (or pass it to a FileReader ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM