I've written a Java applet which gets HTML content from multiple pages from a single host and extracts data from it. I use Jsoup and it's working perfectly, but it automatically uses cookies for that host set in the browser and sends newly set cookies on subsequent requests. (I believe this is done natively by Java)
I want it to ignore all cookies set by the server when the applet is run and ignore any cookies that the browser may already have.
My code is very simple.
String url = "http://example.com/my/web-page.html";
Document document = Jsoup.connect(url).userAgent("<hard-coded static value>").get();
// Extract data from document with org.Jsoup.nodes.Document.select(), etc.
This repeats with multiple URLs, all having the same host (example.com).
In summary, I basically want it to:
I've searched a lot and haven't been able to find a solution. I'd really appreciate any amount of help. I don't mind using Apache HTTPClient or any other third-party library, but I'd prefer not to so I can keep the applet's file size small.
Thanks a ton in advance :)
You should manipulate org.jsoup.Connection.Request
for this:
String url = "http://example.com/my/web-page.html";
Connection con = Jsoup.connect(url).userAgent("<hard-coded static value>");
...
con.get();
...
Request request = con.request();
Map<String, String> cookies = request.cookies();
for(String cookieName : cookies.keySet()) {
//filter cookies you want to stay in map
request.removeCookie(cookieName);
}
You should disable also followRedirects
and do redirects manually (removing cookies). You will have to implement your own "Cookie/Domain remover".
JSoup
uses internally java.net.HttpURLConnection
and you can't intercept somehow the core functionality of actually invoking execute
method on org.jsoup.helper.HttpConnection.Response.execute(...)
because its static and has package protected access. Also you can't set req
(request private object) and res
(response private object) in HttpConnection
. Moreover you can't implement your own org.jsoup.Connection
(or extends its implementation HttpConnection
because of private
constructor) add force JSoup to use that.
Considering all above I advice - use HttpClient / HtmlUnit - because you'll eventually end up with "reinventing the wheel" in restricted environment.
Instead of using Connection
(The resulting return from Jsoup.connect("url");
method), use Response
Map<String, String> cookies = new HashMah<String, String>();
Response res = Jsoup
.connect("url")
.cookies(cookies)
.userAgent("userAgent")
.method(Method.GET) //Or whatever method needed be
.execute();
I know it is a huge line, but that'll work fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.