简体   繁体   中英

Cookie session using HttpURLConnection

I'm developing an application that login in a website. I'm with problem 'cause, when i read the browser's request header, there is a cookie that the browser sends. I need to know how can i do that in my application, i mean, when i start a connection, it defines by itself the cookies of request. I tried to use this CookieHandler.setDefault( new CookieManager( null, CookiePolicy.ACCEPT_ALL ) ); but didn't work.

Source:

CookieHandler.setDefault( new CookieManager( null, CookiePolicy.ACCEPT_ALL ) );
URL url2 = new URL("https://m.example.com.br/login.jhtml");
        HttpURLConnection conn = (HttpURLConnection) url2.openConnection();
        conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
        conn.setRequestMethod("POST");
        conn.setRequestProperty("User-Agent","User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0");
        conn.setRequestProperty("Content-Length", parameters + Integer.toString(parameters.getBytes().length));
        conn.setFollowRedirects(true);
        conn.setDoInput(true);
        conn.setDoOutput(true);
        conn.setUseCaches(false);
        DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
        wr.writeBytes(parameters);
        wr.flush();
        wr.close();
        if (conn.getResponseCode()== 200){
            InputStream in =  conn.getInputStream();
            BufferedReader rd = new BufferedReader(new InputStreamReader(in));
            String line=null;
            StringBuffer response = new StringBuffer();
            while((line = rd.readLine()) != null) {
                response.append(line);
                response.append('\r');
            }
            rd.close();
            System.out.println(response.toString());
        }

Request Header of my application:

Content-Type: application/x-www-form-urlencoded
User-Agent: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Connection: Keep-Alive
Accept-Encoding: gzip
Cookie: TS0163e05c="01ed0a5ec20a04efb37decf4185e55cfe68e06164c32f1a95d1d5b8f12c72abbee029ed64985c09681a55832e444c61821a1eb6fb22d6ed9880314fa0c342074316e309642";$Path="/";$Domain="example.com"; ps-website-switching-v2=%7B%22ps-website-switching%22%3A%22ps-website%22%7D; TS015a85bd=01ed0a5ec25aecf271e4e08c02f852e9ea6199a117a0a8e0339b3e98fd1d51518e5f09ead481039d4891f66e9cc48a13ced14792de
Content-Length: 198

Request Header of Browser:

Host: m.example.com
Connection: keep-alive
Content-Length: 197
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Linux; Android 5.0.2; LG-D337 Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: _ga=GA1.3.313511484.1517525889; _gid=GA1.3.507266479.1517525889; DEretargeting=563; CSASF=; JS_SESS=; BT=%3B106%3B; DN....

Pay attention to the Cookies, why are they so difference? What can i do to send cookies like this without to have setting using the conn.setRequestProperty("Cookie",cookie); ?

HttpURLConnection is not a very reliable way to scrape or interact with websites, for the following reasons:

  • HttpURLConnection doesn't understand JavaScript. JavaScript can set cookies as well as provide major parts of functionality of the website.
  • HttpURLConnection doesn't download all resources associated with a page, like other .html files in frames, images (0 px images can sometimes also add cookies, but if you never get them, you'll never get the cookie), JavaScript, and so on.
  • CookieHandler only works for cookies that are passed to you directly in the HTTP Response Headers. If anything within the content of the site (including embedded content like images) would cause more cookies to be created, you're not getting them with CookieHandler, because it doesn't understand HTML/JS/etc.

You should use Selenium instead. Selenium automates a real web browser (or at least something closer to a real web browser) that can parse HTML and behaves according to the expectations of the web standards.

As far as which browser driver (backend) to use, here are a few options:

  • HtmlUnit , which is perhaps the fastest driver (at least it has the least memory overhead), but with a downside that not all the latest web standards and technologies are supported. If it works with your site, it's probably the best choice because it doesn't require any native driver component. It does support a fair chunk of JavaScript (see the link), but with perhaps less up-to-date feature support compared to the latest Firefox or Chrome. HtmlUnit is headless.
  • PhantomJS , which is based on a fairly dated version of WebKit. Your web standards support will be current as of about 2013, which is probably fine for most websites, but some cutting-edge features won't work. It also has a fair number of bugs with certain types of content. However, it's also headless, has a pretty big user base, and is generally lower overhead than a full-blown browser.
  • Firefox, Chrome, Edge, Opera, or IE. Firefox and Chrome now have headless support as an option.

The difference between "headless" and "not headless" (or "headed", if you prefer) is that a headless browser does not create any GUI windows. If you're running on a headless Linux box, this is practically a requirement unless you want to create an Xvfb virtual X server or something. If you're running this from a computer with a graphical interface (Windows, MacOS, or desktop Linux), it's up to you if you want to see the browser pop up when you run your code.

Headless browsers do tend to be relatively faster, and you can scale out more instances of them in parallel because they aren't taking up any graphics resources on your system as you use them. They just use the browser engine itself to process the web content and allow you to access/drive it through Selenium.

If you do want headless, but you need the very latest web platform features and standards support, look into using Headless Chrome or Headless Firefox.

Headless Chrome intro: https://developers.google.com/web/updates/2017/04/headless-chrome

Headless Firefox intro: https://developer.mozilla.org/en-US/Firefox/Headless_mode

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM