简体   繁体   English

jsoup连接参数

[英]jsoup connect parameter

I access a webpage by passing the session id and url and output is a HTML response. 我通过传递会话ID和URL来访问网页,并且输出是HTML响应。 I want to use jSoup to parse this response and get the tag elements. 我想使用jSoup解析此响应并获取标签元素。 I see the examples in Jsoup takes a String for establishing connection. 我看到Jsoup中的示例使用String来建立连接。 How do i proceed. 我该如何进行。

pseudo code: 伪代码:

I tried the above method and got this exception 我尝试了上述方法并得到了这个例外

java.io.IOException: 401 error loading URL http://www.abc.com/index
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)

Basically the entity.getContent() has the HTML response which has to be passed as a String to the connect method. 基本上, entity.getContent()具有HTML响应,该响应必须作为String传递给connect方法。 But it doesn't work. 但这是行不通的。

Apache Commons HttpClient and Jsoup do not share the same cookie store. Apache Commons HttpClient和Jsoup不共享相同的cookie存储。 You basically need to pass the very same cookies as HttpClient has retrieved back through Jsoup's Connection . 基本上,您需要传递与HttpClient通过Jsoup的Connection检索回来的相同的cookie。 You can find some concrete examples here: 您可以在此处找到一些具体示例:

Alternatively, you can also just continue using HttpClient for firing HTTP requests and maintaining the cookies and instead feeds its HttpResponse as String through Jsoup#parse() . 或者,您也可以继续使用HttpClient来Jsoup#parse() HTTP请求和维护cookie,而是通过Jsoup#parse()将其HttpResponse作为String Jsoup#parse()

So this should do: 因此,应该这样做:

HttpResponse httpResponse = httpclient1.execute(httpget, httpContext);
String html = EntityUtils.toString(httpResponse.getEntity());
Document doc = Jsoup.parse(html, testUrl);
// ...

By the way, you do not necessarily need to create a whole new HttpClient for a subsequent request. 顺便说一下,您不必为后续请求创建一个全新的HttpClient Just reuse httpclient which you already created. 只需重用您已经创建的httpclient Also your way of obtaining the response as String is clumsy. 另外,以String形式获取响应的方式也很笨拙。 The second line in the above example shows how to do it at simplest. 上面示例的第二行显示了最简单的方法。

It shows an http error 401 which means 它显示了一个HTTP错误401,这意味着

Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided . Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided

Therefore, i think you need to login into the website using your java code or identify yourself by sending cookies through your code. 因此,我认为您需要使用您的Java代码登录网站或通过通过您的代码发送cookie来识别自己。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM