简体   繁体   中英

Java - How to get and add cookies to request header correctly?

I need to go to a bunch of different pages on a web site and collect info. I'm not sure how to handle cookies. If I use the chrome debugger console (F12) to look at the Network activity, I can see the request properties and cookies being sent. If I specifically add the cookie for one of the pages (see the commented out con.setRequestProperty("Cookie", ...), the info is successfully retrieved.

            URL url = new URL(urlStr);
            HttpURLConnection con = (HttpURLConnection) url.openConnection();
            con.setRequestMethod("GET");
            con.setRequestProperty("Host", county +"." +referer +".com");
            con.setRequestProperty("Connection", "keep-alive");
            con.setRequestProperty("Accept", "application/json, text/javascript, */*; q=0.01");
            con.setRequestProperty("X-Requested-With", "XMLHttpRequest");
            con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36");
            con.setRequestProperty("Origin", "http://evil.com/");
            con.setRequestProperty("Referer", "https://" +county +"." +referer +".com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=" +df.format(date));
            con.setRequestProperty("Accept-Language", "en-US,en;q=0.9");
            //con.setRequestProperty("Cookie", "cfid=9ed9c083-4696-4712-950d-1c0ad0727883; cftoken=0; AWSELB=CF13C5A70AE16731FBD093515EF0DDB58935BEB4D69838721C70C3BED039F919AF343D891D9A2001BD1070AC4C076AA72DF0A7EA6AEED1091BCD24CC7203622E75C0DE5C92; _gcl_au=1.1.1696117075.1563489288; __utmc=119398810; __utmz=119398810.1563489288.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_TC=1563505029291; __utma=119398810.1711105058.1563489288.1563498837.1563505090.3; __utmt_UA-51657054-1=1; __utmb=119398810.10.10.1563505090; testcookiesenabled=disabled; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_LV=1563508162268; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_HC=221");

            //handle cookies
            String cookiesHeader = con.getHeaderField("Set-Cookie");
            List<HttpCookie> cookies = HttpCookie.parse(cookiesHeader);
            CookieManager cookieManager = new CookieManager();
            cookies.forEach(cookie -> cookieManager.getCookieStore().add(null, cookie));
            con.disconnect();
            con = (HttpURLConnection) url.openConnection();     //create new connection with cookies
            con.setRequestProperty("Cookie", StringUtils.join(cookieManager.getCookieStore().getCookies(), ";"));

            BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
            StringBuilder stringBuilder = new StringBuilder();
            while ((str = in.readLine()) != null) {
                stringBuilder.append(str);
            }
            in.close();
            con.disconnect();

But if use the code in the "handle cookies" section (from tutorial https://www.baeldung.com/java-http-request ), an empty data set is returned. Can someone spot what I am doing wrong?

The String cookiesHeader = con.getHeaderField("Set-Cookie"); is used to read the cookies from the response. But in your instance, it's not reading anything since the http request is not yet executed.

So first you would need to execute the request, and then you'd be able to read the cookies from the response with String cookiesHeader = con.getHeaderField("Set-Cookie"); . So just add a con.connect() before String cookiesHeader = con.getHeaderField("Set-Cookie"); , which would execute the request and then help read the cookies from the response. The rest of the code would then add the cookies received back to the request.

con.connect();
String cookiesHeader = con.getHeaderField("Set-Cookie");

You may also first check if the request execution succeeds and only then read the cookies and do the rest of the process as below:

int statusCode = con.getResponseCode();
if (statusCode == 200) {
   String cookiesHeader = con.getHeaderField("Set-Cookie");
   //rest of the code
}

It appears I might be barking up the wrong tree. There are parameters to the url that apparently change over time. You can see that below.

https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563563124890&bypassPage=1&test=1&_=1563563124891

https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563508160468&bypassPage=1&test=1&_=1563508160468

I don't know what the numbers mean or how to supply the right one at the right time. The first one which was created yesterday now returns an empty set, the second one created just now returns good data.

Edit: Well, I figured out what the numbers mean. There is a separate query to get the time in millis in New York, plus an offset. I've implemented that query and now it creates a valid url that always returns good data, if I paste it separately into a new browser window. But it still isn't showing me that data in my java code.

Here is the request headers and other data I see in the Chrome debugger (F12) Network tab when I access the data the official way from their link:

General

Request URL: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816
Request Method: GET
Status Code: 200 OK
Remote Address: 34.236.53.129:443
Referrer Policy: no-referrer-when-downgrade

Response Headers

Access-Control-Allow-Headers: content-type Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
Access-Control-Allow-Origin: * Allow: POST, GET, OPTIONS, PUT, DELETE
Connection: keep-alive Content-Encoding: gzip Content-Length: 1179
Content-Type: text/html;charset=UTF-8 Date: Sat, 20 Jul 2019 13:47:52 GMT
Server: Realforeclose/1a Vary: Accept-Encoding

Request Headers

Provisional headers are shown
Accept: application/json, text/javascript, */*; q=0.01
Referer: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/25/2019
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
X-Requested-With: XMLHttpRequest

Query String Parameters

zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM