简体   繁体   中英

Java applet - delete/ignore all cookies (JSoup)

I've written a Java applet which gets HTML content from multiple pages from a single host and extracts data from it. I use Jsoup and it's working perfectly, but it automatically uses cookies for that host set in the browser and sends newly set cookies on subsequent requests. (I believe this is done natively by Java)

I want it to ignore all cookies set by the server when the applet is run and ignore any cookies that the browser may already have.

My code is very simple.

String url = "http://example.com/my/web-page.html";
Document document = Jsoup.connect(url).userAgent("<hard-coded static value>").get();
// Extract data from document with org.Jsoup.nodes.Document.select(), etc.

This repeats with multiple URLs, all having the same host (example.com).

In summary, I basically want it to:

  1. Ignore any cookies for example.com that might be set in the browser.
  2. If the server sets any new cookies when the applet makes a request, ignore it for subsequent requests. If possible, also block the cookie from being stored in the browser.

I've searched a lot and haven't been able to find a solution. I'd really appreciate any amount of help. I don't mind using Apache HTTPClient or any other third-party library, but I'd prefer not to so I can keep the applet's file size small.

Thanks a ton in advance :)

You should manipulate org.jsoup.Connection.Request for this:

    String url = "http://example.com/my/web-page.html";
    Connection con = Jsoup.connect(url).userAgent("<hard-coded static value>");
    ...
    con.get();
    ...
    Request request = con.request();
    Map<String, String> cookies = request.cookies();
    for(String cookieName : cookies.keySet()) {
        //filter cookies you want to stay in map
        request.removeCookie(cookieName);
    }

You should disable also followRedirects and do redirects manually (removing cookies). You will have to implement your own "Cookie/Domain remover".

JSoup uses internally java.net.HttpURLConnection and you can't intercept somehow the core functionality of actually invoking execute method on org.jsoup.helper.HttpConnection.Response.execute(...) because its static and has package protected access. Also you can't set req (request private object) and res (response private object) in HttpConnection . Moreover you can't implement your own org.jsoup.Connection (or extends its implementation HttpConnection because of private constructor) add force JSoup to use that.

Considering all above I advice - use HttpClient / HtmlUnit - because you'll eventually end up with "reinventing the wheel" in restricted environment.

Instead of using Connection (The resulting return from Jsoup.connect("url"); method), use Response

Map<String, String> cookies = new HashMah<String, String>();

Response res = Jsoup
    .connect("url")
    .cookies(cookies)
    .userAgent("userAgent")
    .method(Method.GET) //Or whatever method needed be
    .execute();

I know it is a huge line, but that'll work fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM