简体   繁体   English

登录https网站并仅使用核心Java API下载页面

[英]Login to https website and download a page using only core Java APIs

I want to login to a https website with username and password, go to one url in that website and download the page at the url (and maybe parse contents of that page). 我想使用用户名和密码登录到https网站,请转到该网站中的一个网址,然后从该网址下载该页面(并可能解析该页面的内容)。 I want to do this using only core Java apis and not htmlunit, jsoup etc. I got the below code to learn how to do this, but it does not show me how to login to a website. 我只想使用核心Java api来执行此操作,而不要使用htmlunit,jsoup等来执行此操作。我得到了以下代码来学习如何执行此操作,但未显示如何登录网站。 Please tell me how I can login, maintain a session and then finally close the connection. 请告诉我如何登录,维持会话并最后关闭连接。

Source - http://www.mkyong.com/java/java-https-client-httpsurlconnection-example/ 来源-http: //www.mkyong.com/java/java-https-client-httpsurlconnection-example/

import java.net.MalformedURLException;
import java.net.URL;
import java.security.cert.Certificate;
import java.io.*;

import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLPeerUnverifiedException;

public class HttpsClient{

   public static void main(String[] args)
   {
        new HttpsClient().testIt();
   }

   private void testIt(){

      String https_url = "https://www.google.com/";
      URL url;
      try {

         url = new URL(https_url);
         HttpsURLConnection con = (HttpsURLConnection)url.openConnection();

         //dumpl all cert info
         print_https_cert(con);

         //dump all the content
         print_content(con);

      } catch (MalformedURLException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }

   }

   private void print_https_cert(HttpsURLConnection con){

    if(con!=null){

      try {

    System.out.println("Response Code : " + con.getResponseCode());
    System.out.println("Cipher Suite : " + con.getCipherSuite());
    System.out.println("\n");

    Certificate[] certs = con.getServerCertificates();
    for(Certificate cert : certs){
       System.out.println("Cert Type : " + cert.getType());
       System.out.println("Cert Hash Code : " + cert.hashCode());
       System.out.println("Cert Public Key Algorithm : " 
                                    + cert.getPublicKey().getAlgorithm());
       System.out.println("Cert Public Key Format : " 
                                    + cert.getPublicKey().getFormat());
       System.out.println("\n");
    }

    } catch (SSLPeerUnverifiedException e) {
        e.printStackTrace();
    } catch (IOException e){
        e.printStackTrace();
    }

     }

   }

   private void print_content(HttpsURLConnection con){
    if(con!=null){

    try {

       System.out.println("****** Content of the URL ********");            
       BufferedReader br = 
        new BufferedReader(
            new InputStreamReader(con.getInputStream()));

       String input;

       while ((input = br.readLine()) != null){
          System.out.println(input);
       }
       br.close();

    } catch (IOException e) {
       e.printStackTrace();
    }

       }

   }

}

Every website manages logins differently. 每个网站对登录名的管理方式都不相同。 You will need to scout the website, find out how the session is maintained, and mimic the functions in such a way that the server can't tell that it is not a browser. 您将需要侦察网站,找出如何维护会话,并以服务器无法辨别它不是浏览器的方式模拟功能。

In general, a web server stores a secret hash in the cookie. 通常,Web服务器在Cookie中存储秘密哈希。 Here is the process 这是过程

  1. Post a login and password to said url using HttpsURLConnection to send the form. 使用HttpsURLConnection发送登录名和密码到所述url以发送表单。
  2. The server responds with a hash in a header that it wants stored in the cookie. 服务器以要存储在cookie中的标头中的哈希值作为响应。 Usually has session in the name. 通常在名称中具有会话。
  3. Send requests back with the hash in the header in the correct value 将请求发送回标头中带有正确值的哈希值

All of the above can be done only using URL and HttpsURLConnection, but you will need to mimic a browser exactly to trick the server. 以上所有操作只能使用URL和HttpsURLConnection来完成,但是您将需要完全模仿浏览器来欺骗服务器。

For scouting, I would recommend using a tool like fiddler . 对于侦察,我建议使用fiddler之类的工具。 It captures all communication from the webserver and back, so that you can see exactly what is going on at the http level to mimic in your java code. 它捕获了来自Web服务器的所有通信,并可以与之通信,因此您可以确切地看到在HTTP级别上正在发生什么,以模仿Java代码。

Here is an overview of fiddler . 这是提琴手的概述 I have never looked at the logs. 我从没看过日志。 Fiddler has a sweet interface. Fiddler界面甜美。 The video is really boring, but it gives an overview of the interface. 该视频确实很无聊,但是提供了界面的概述。 You want to look at the raw text view, and mimic that. 您想查看原始文本视图并进行模拟。

For your other question, owasp is a great resource for best practices. 对于您的其他问题, owasp是最佳实践的绝佳资源。 The reality is that there is a lot of insecure and bad code out there that does stuff that you would never expect. 现实情况是,那里有很多不安全和不良的代码,它们会做您无法期望的事情。 I have seen a server put the boolean value inside of a script tag to be stored as a javascript variable. 我已经看到服务器将布尔值放在脚本标记中,以作为javascript变量存储。 You just have to carefully watch how the server changes the responses after you log in. For a popular website following best practices, they will use the above method. 您只需要仔细观察登录后服务器如何更改响应。对于遵循最佳实践的流行网站,他们将使用上述方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM