从URL进行JSOUP HTML解析

Question

我在Java中使用JSOUP解析喜欢这两个HTMLS：这个和这个。

在第一种情况下，我得到了输出。

我对连接有疑问：

doc = Jsoup.connect(url).get();

有一些URL可以很容易地解析，并且我已经得到了输出，但是也有一些URL会产生这样的空输出：

Title: [].

如果两个URL相同，我将无法理解问题所在。 这是我的代码：

Document doc;

try {
   doc = Jsoup.connect("http://ekonomika.sme.sk/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html").get();
   String title = doc.title();
   System.out.println("title : " + title);      
} 
catch (IOException e) {
   e.printStackTrace();
}

Answer 1

看看第二个网址的开头是什么

Element h = doc.head();
System.out.println("head : " + h);

您会看到一些元刷新标签和一个空标题：

<head> 

 <noscript> 
  <meta http-equiv="refresh" content="1;URL='/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html?piano_d=1'"> 
 </noscript> 

 <meta http-equiv="refresh" content="10;URL='/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html?piano_t=1'"> 

 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 

 <title></title> 

</head>

其中说明了空标题。 您必须遵循重定向。

Answer 2

这是我的解析代码，使用此URL我没有输出。 / * *要更改此许可证标题，请在“项目属性”中选择“许可证标题”。 *要更改此模板文件，请选择工具| 模板*，然后在编辑器中打开模板。 * /包commentparser;

import java.io.IOException;
import static java.lang.Boolean.FALSE;
import static java.lang.Boolean.TRUE;
import java.net.URL;
import static java.sql.JDBCType.NULL;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import static javafx.beans.binding.Bindings.length;
import static jdk.nashorn.internal.objects.ArrayBufferView.length;
import static oracle.jrockit.jfr.events.Bits.length;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class CommentParser {

  public static void main(String[] args) {


    Document doc;
    try {
        doc = Jsoup.connect("http://ekonomika.sme.sk/c/8047766/s-velkymi-chybami-stavali-aj-budovu-centralnej-banky.html").followRedirects(true).get();

        String title = doc.title();       
        System.out.println("title : " + title); 
        //Link for discussions  
                if(doc.select("a[href^=/diskusie/reaction_show]").isEmpty() == FALSE){
                   Elements description = doc.select("a[href^=/diskusie/reaction_show]");
                    for (Element link : description) {
                        // get the value from href attribute
                        System.out.println("Diskusie: " + link.attr("href"));
                    }
                }
                //Author of article
                if(doc.select("span[class^=autor]").isEmpty() == FALSE){
                   Elements description = doc.select("span[class^=autor]");
                    for (Element link : description) {
                        // get the value from href attribute
                        //System.out.println("\nlink : " + link.attr("b"));
                        System.out.println(link.text());
                    }
                }
        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            // get the value from href attribute
            System.out.println("\nlink : " + link.attr("href"));
            System.out.println("text : " + link.text());

        }
    } catch (IOException e) {
        e.printStackTrace();
    }
  }
}

从URL进行JSOUP HTML解析

问题描述

2 个解决方案

解决方案1
0 2015-10-24 22:17:08

解决方案2
0 2015-10-25 12:06:40

从URL进行JSOUP HTML解析

问题描述

2 个解决方案

解决方案1 0 2015-10-24 22:17:08

解决方案2 0 2015-10-25 12:06:40

解决方案1
0 2015-10-24 22:17:08

解决方案2
0 2015-10-25 12:06:40