简体   繁体   中英

Java URLConnection The cookie is not set

I am trying to develop an instagram scraper; this is my code:

 try {
            
            
            System.out.println("search in https://instagram.com/" + txtUsername.getText() + "?__a=1");
            URLConnection connection = new URL("https://instagram.com/" + txtUsername.getText() + "?__a=1").openConnection();
            
            
            
            /*connection
                    .setRequestProperty("User-Agent",
                            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");*/
            connection
                    .setRequestProperty("Cookie",
                            "sessionid=XXXXXXXXXXXXXXXXXXXXX"); //setting cookie
 
            connection.connect();
            
            BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(),
                    Charset.forName("UTF-8")));
            
            StringBuilder sb = new StringBuilder();
            String line;
            while (r.readLine() != null) {
                    sb.append(r.readLine());
                
            }
            System.out.println(sb.toString());
        } catch (MalformedURLException ex) {
            Logger.getLogger(MainFrame.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(MainFrame.class.getName()).log(Level.SEVERE, null, ex);
        }

I am therefore trying to set a session cookie to simulate a login and be able to view a user's page in order to get the data (followers, following etc. from this link https://www.instagram.com/username/?__a=1 ). The problem is that the cookie is not set and in fact what I receive in output on the console is the source code of the instagram login page, this means that the cookie did not exist (or that the session is wrong but I'm sure it's right ). How can I solve this problem and then set the cookie?

The web server sets the session id cookie. You can find it in Chrome see F12 -> Application-> Cookies and should also be seen in home page headers. You can try two things:

If you want to simulate the login using java core, you need to set with setRequestProperty most of the parameters your browser is sending (in Chrome see F12 -> Network -> Headers ->Request Headers ) when you make a login request having set also the initial session. But this approach might not work since there are multiple layers of security in a large enterprise web app. With simple APIs or static web pages it would be simple.

What would have a higher chance of success is using a testing framework such as Selenium with ChromeDriver or Gecko for Mozilla . You just instruct the driver to login with your user and then access the user page then parse the page as you wanted.

Keep in mind that both approaches might not be accepted by Instagram policies or if you succeed, the requests from your IP would be redirected by the developer team.

Maintaining the session

You can use the CookieHandler API to maintain cookies. You need to prepare a CookieManager with a CookiePolicy of ACCEPT_ALL before sending all HTTP requests.

// First set the default cookie manager.
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));

// All the following subsequent URLConnections will use the same cookie manager.
URLConnection connection = new URL(url).openConnection();
// ...

connection = new URL(url).openConnection();
// ...

connection = new URL(url).openConnection();
// ...

Note that this is known to not always work properly in all circumstances. If it fails for you, then best is to manually gather and set the cookie headers. You basically need to grab all Set-Cookie headers from the response of the login or the first GET request and then pass this through the subsequent requests.

// Gather all cookies on the first request.
URLConnection connection = new URL(url).openConnection();
List<String> cookies = connection.getHeaderFields().get("Set-Cookie");
// ...

// Then use the same cookies on all subsequent requests.
connection = new URL(url).openConnection();
for (String cookie : cookies) {
    connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]);
}
// ...

The split(";", 2)[0] is there to get rid of cookie attributes which are irrelevant for the server side like expires, path , etc. Alternatively, you could also use cookie.substring(0, cookie.indexOf(';')) instead of split() .

For more details

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM