繁体   English   中英

扫描给定格式Jsoup Java的网站链接

[英]Scan the website link given format Jsoup Java

我尝试根据选择器级别尝试扫描网页中的所有链接。我已经给出了

这是我的陈述,我已经读了一些固定的选择器,我想在循环中读取更多,递归或任何帮助我变得更灵活的命令级别选择器我的未来可能大于2

public static void main(String[] args) {
        String website = website("http://www.java2s.com/");
        System.out.println(website);
    }

    private static String website(String url) {
        String lstLink = "";
        try {
            String level[] = {"div.col-md-9 li a", "div#sidebar ul li a"};
            //Level 1
            Document connect = Jsoup.connect(url).get();
            Elements selectLevel1 = connect.select(level[0]);
            for (Element level1 : selectLevel1) {
                lstLink += level1.attr("href") + "\n";

                //Level2
                Document connect2 = Jsoup.connect(level1.attr("href")).get();
                Elements selectLevel2 = connect2.select(level[1]);
                for (Element level2 : selectLevel2) {
                    lstLink += level2.attr("href") + "\n";
                }
            }
        } catch (IOException ex) {
            Logger.getLogger(AWebsite.class.getName()).log(Level.SEVERE, null, ex);
        }
        return lstLink;

    }

请检查一下。

    static String levels[] = {"div.col-md-9 li a", "div#sidebar ul li a"};

    private static String getRecursive(String href, int level) {

         String links = "";

         if (level > levels.length-1) {
             return "";
         }

         Document doc;
         try {
             doc = Jsoup.connect(href).get();
             Elements elements = doc.select(levels[level]);

             level++;

             for (Element element : elements) {
                 if(!element.attr("href").isEmpty())
                 {
                     links += element.attr("abs:href") + "\n";
                     links += getRecursive(element.attr("abs:href"), level);
                 }
             }
         } catch (IOException e1) {
             e1.printStackTrace();
         }
         return links;
     }



public static void main(String[] args) {
    String website = getRecursive("http://www.java2s.com/", 0);
    System.out.println(website);
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM