[JAVA]从网页获取html链接

Question

I want to get the link in this pic using java, image is below.我想使用java获取这张图片中的链接，图片如下。 There are few more links in that webpage.该网页中还有几个链接。 I found this code on stackoverflow, I don't understand how to use it though.我在stackoverflow上找到了这段代码，但我不明白如何使用它。

 import org.jsoup.Jsoup;
 import org.jsoup.nodes.Document;
 import org.jsoup.nodes.Element;
 import org.jsoup.select.Elements;

 public class weber{
    public static void main(String[] args)throws Exception{
        String url = "http://www.skyovnis.com/category/ufology/";
        Document doc = Jsoup.connect(url).get();

        /*String question = doc.select("#site-inner").text();
        System.out.println("Question: " + question);*/

        Elements anser = doc.select("#container .entry-title a");
        for (Element anse : anser){
            System.out.println("Answer: " + anse.text());
        }
    }
}

code is edited from the original I found tho.代码是从我发现的原始代码中编辑的。 please help.请帮忙。

Answer 1

For your URL following code works fine.对于您的网址，以下代码工作正常。

public static void main(String[] args) {

    Document doc;
    try {

        // need http protocol
        doc = Jsoup.connect("http://www.skyovnis.com/category/ufology/").userAgent("Mozilla").get();
        // get page title
        String title = doc.title();
        System.out.println("title : " + title);

        // get all links (this is what you want)
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            // get the value from href attribute
            System.out.println("\nlink : " + link.attr("href"));
            System.out.println("text : " + link.text());

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

  }

output was输出是

title : Ufology

link : http://www.shop.skyovnis.com/
text : Shop

link : http://www.shop.skyovnis.com/product-category/books/
text : Books

Following code filter the links by text of it.以下代码按文本过滤链接。

        for (Element link : links) {



            if(link.text().contains("Arecibo Message"))//find the link with some texts
            {
                System.out.println("here is the element you need");
                System.out.println("\nlink : " + link.attr("href"));
                System.out.println("text : " + link.text());
            }


        }

It's recommended to specify a “userAgent” in Jsoup, to avoid HTTP 403 error messages.建议在 Jsoup 中指定“userAgent”，以避免 HTTP 403 错误消息。

Document doc = Jsoup.connect(" http://anyurl.com ").userAgent("Mozilla").get();文档 doc = Jsoup.connect(" http://anyurl.com ").userAgent("Mozilla").get();

"Onna malli mage yuthukama kala." “Onna malli 法师 yuthukama kala。”

refernce :参考：

https://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/ https://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/

[JAVA]从网页获取html链接

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-09-10 17:55:22

[JAVA]从网页获取html链接

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-09-10 17:55:22

解决方案1
1 已采纳 2016-09-10 17:55:22