简体   繁体   English

将网页转换为 HTML

[英]Converting webpage into HTML

I want to convert a webpage into an HTML page programatically.我想以编程方式将网页转换为 HTML 页面。
I searched many sites but only providing details like converting into pdf format etc.我搜索了许多网站,但只提供了诸如转换为 pdf 格式等的详细信息。
For my program now I'm saving a page as .html and then extracting the necessary data.对于我的程序,我现在将页面保存为 .html,然后提取必要的数据。
Is there any way to convert the webpage to an html page?有没有办法将网页转换为html页面? Can anyone help me?谁能帮我?
Any help would be appreciated.任何帮助,将不胜感激。

Well I can explain in detail好吧我可以详细解释

I am extracting the names of users who like a page which i'm admin of .我正在提取喜欢我管理的页面的用户的姓名。 So I found a link https://www.facebook.com/browse/?type=page_fans&page_id=pageid where i can find the list of users.所以我找到了一个链接https://www.facebook.com/browse/?type=page_fans&page_id=pageid在这里我可以找到用户列表。 So for getting it first of all i have to save it as a .html page and then extract necessary data.因此,首先我必须将其保存为 .html 页面,然后提取必要的数据。 So here I'm converting it into .html and then extract the data.所以在这里我将其转换为 .html,然后提取数据。 But what I need is that convert that page into an HTML page using my program.但我需要的是使用我的程序将该页面转换为 HTML 页面。 I hope my question is clear now我希望我的问题现在很清楚

Oracle provides the following code snippet for programmatically retrieving an html page here . Oracle 在此处提供了以下代码片段,用于以编程方式检索 html 页面。

import java.net.*;
import java.io.*;

public class URLReader {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.oracle.com/");
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

Instead of printing to console, you can save the contents to a file by using a FileWriter and BufferedWriter (example from this question ):您可以使用 FileWriter 和 BufferedWriter 将内容保存到文件,而不是打印到控制台( 此问题中的示例):

    FileWriter fstream = new FileWriter("fileName");
    BufferedWriter fbw = new BufferedWriter(fstream);

    while ((line = in.readLine()) != null) {

        fbw.write(line + "\n");

    }

Webpages are already HTML, if you want to save a webpage as HTML you can do this via the Firefox > Save Page As menu on Firefox.网页已经是 HTML,如果您想将网页保存为 HTML,您可以通过Firefox > 将页面另存为菜单在 Firefox 上执行此操作。 Or through File menu on other browsers.或者通过其他浏览器上的文件菜单。

If you need to download multiple pages in HTML from the same website or from a list of URLs there is a software that will make it easier for you: http://www.httrack.com/如果您需要从同一个网站或从 URL 列表下载多个 HTML 页面,有一个软件可以让您更轻松: http : //www.httrack.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM