当apache.httpclient能够获取内容时，JSoup.connect会抛出403错误

Question

I am trying to parse HTML dump of any given page. 我试图解析任何给定页面的HTML转储。 I used HTML Parser and also tried JSoup for parsing. 我使用HTML Parser并尝试使用JSoup进行解析。

I found useful functions in Jsoup but I am getting 403 error while calling Document doc = Jsoup.connect(url).get(); 我在Jsoup中找到了有用的函数，但在调用Document doc = Jsoup.connect(url).get();时遇到403错误Document doc = Jsoup.connect(url).get();

I tried HTTPClient, to get the html dump and it was successful for the same url. 我尝试了HTTPClient，以获得html转储，并且它在同一个网址上获得了成功。

Why is JSoup giving 403 for the same URL which is giving content from commons http client? 为什么JSoup为同一个URL提供403，它提供来自公共http客户端的内容？ Am I doing something wrong? 难道我做错了什么？ Any thoughts? 有什么想法吗？

Answer 1

Working solution is as follows (Thanks to Angelo Neuschitzer for reminding to put it as a solution): 工作解决方案如下（感谢Angelo Neuschitzer提醒将其作为解决方案）：

Document doc = Jsoup.connect(url).userAgent("Mozilla").get();
Elements links = doc.getElementsByTag(HTML.Tag.CITE.toString);
for (Element link : links) {
            String linkText = link.text();
            System.out.println(linkText);
}

So, userAgent does the trick :) 所以， userAgent做的诀窍:)

当apache.httpclient能够获取内容时，JSoup.connect会抛出403错误

问题描述

1 个解决方案

解决方案1
47 已采纳 2012-04-13 07:12:50

当apache.httpclient能够获取内容时，JSoup.connect会抛出403错误

问题描述

1 个解决方案

解决方案1 47 已采纳 2012-04-13 07:12:50

解决方案1
47 已采纳 2012-04-13 07:12:50