在Java中使用Jsoup时输出异常

Question

I am getting this output when trying to use Jsoup to extract text from Wikipedia: 尝试使用Jsoup从Wikipedia中提取文本时，我得到以下输出：

I dont have enough rep to post pictures as I am new to this site but its basically like this: 我没有足够的代表来张贴图片，因为我是这个网站的新手，但基本上是这样的：

[]{k[]q[]f[]d[]d  etc..

Here is part of my code: 这是我的代码的一部分：

public static void scrapeTopic(String url)

{
    String html = getUrl("http://www.wikipedia.org/" + url);



    Document doc = Jsoup.parse(html);

    String contentText = doc.select("*").first().text();

    System.out.println(contentText);


}

It appears to get all the information but in the wrong format! 它似乎获得了所有信息，但格式错误！

I appreciate any help given Thanks in advance 感谢您提供的任何帮助

Answer 1

Here are some suggestion for you. 这是给你的一些建议。 While fetching general webpage, which doesn't require HTTP header's field to be set like cookie , user-agent just call: 在获取一般网页时，不需要像cookie那样设置HTTP标头的字段， 用户代理只需调用：

Document doc = Jsoup.connect("givenURL").get();

This function read the webpage using a GET request. 此功能使用GET请求读取网页。 When you are selecting element using * , it returns any element, that is all the element of the document. 当您使用*选择元素时，它将返回任何元素，即文档的所有元素。 Hence, calling doc.select("*").first() is returning the #root element. 因此，调用doc.select("*").first()返回#root元素。 Try printing it to see: 尝试打印以查看：

System.out.println(doc.select("*").first().tagName()); // #root
System.out.println(doc.select("*").first());  // will print the whole document, 
System.out.println(doc); //print the whole document, the above action is pointless
System.out.println(doc.select("*").first()==doc); 
               // check whither they are equal, and it will print TRUE

I am assuming that you are just playing around to learn about this API, although selector is much powerful, but a good start should be trying general document manipulation function eg, doc.getElementsByTag() . 我假设您只是在四处学习有关此API的信息，尽管selector功能强大，但是一个好的开始应该尝试使用常规的文档操作功能，例如doc.getElementsByTag() 。

However, in my local machine, i was successful to fetch the Document and parsing it using your getURL() function !! 但是，在我的本地计算机上，我成功获取了文档并使用您的getURL()函数对其进行了解析！

在Java中使用Jsoup时输出异常

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-10-13 03:23:50

在Java中使用Jsoup时输出异常

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-10-13 03:23:50

解决方案1
0 已采纳 2013-10-13 03:23:50