简体   繁体   中英

Converting webpage into HTML

I want to convert a webpage into an HTML page programatically.
I searched many sites but only providing details like converting into pdf format etc.
For my program now I'm saving a page as .html and then extracting the necessary data.
Is there any way to convert the webpage to an html page? Can anyone help me?
Any help would be appreciated.

Well I can explain in detail

I am extracting the names of users who like a page which i'm admin of . So I found a link https://www.facebook.com/browse/?type=page_fans&page_id=pageid where i can find the list of users. So for getting it first of all i have to save it as a .html page and then extract necessary data. So here I'm converting it into .html and then extract the data. But what I need is that convert that page into an HTML page using my program. I hope my question is clear now

Oracle provides the following code snippet for programmatically retrieving an html page here .

import java.net.*;
import java.io.*;

public class URLReader {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.oracle.com/");
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

Instead of printing to console, you can save the contents to a file by using a FileWriter and BufferedWriter (example from this question ):

    FileWriter fstream = new FileWriter("fileName");
    BufferedWriter fbw = new BufferedWriter(fstream);

    while ((line = in.readLine()) != null) {

        fbw.write(line + "\n");

    }

Webpages are already HTML, if you want to save a webpage as HTML you can do this via the Firefox > Save Page As menu on Firefox. Or through File menu on other browsers.

If you need to download multiple pages in HTML from the same website or from a list of URLs there is a software that will make it easier for you: http://www.httrack.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM