簡體   English   中英

如何使用 JAVA 中的 HTTP 請求通過 TCP 套接字獲取網頁

[英]How to Fetch Webpage Through TCP socket using HTTP Request in JAVA

My college assignment is to fetch a web page from any web server by URL using a TCP socket and HTTP GET request.

我沒有收到來自任何服務器的HTTP/1.0 200 OK響應。

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {

    public static void main(String[] args) {
            Scanner inpt = new Scanner(System.in);
                System.out.print("Enter URL: ");
                String url = inpt.next();
                TCPConnect(url); 
            }
   public static void TCPConnect(String url) {
        try {
            String hostname = new URL(url).getHost();
            System.out.println("Loading contents of Server: " + hostname);
            InetAddress ia = InetAddress.getByName(hostname);
            String ip = ia.getHostAddress();
            System.out.println(ip + " is IP Adress for  " + hostname);
            String path = new URL(url).getPath();
            System.out.println("Requested Path on the server: " + path);
            Socket socket = new Socket(ip, 80);
            // Create input and output streams to read from and write to the server
            PrintStream out = new PrintStream(socket.getOutputStream());
            BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
            // Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
            if (hostname ! = url) {
                //Request Line
                out.println("GET " + path + " HTTP/1.1");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            } else {
                //Request Line
                out.println("GET / HTTP/1.0");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            }
            // Read data from the server until we finish reading the document
            String line = in.readLine();
            while (line != null) {
                System.out.println(line);
                line = in.readLine();
            }
            // Close our streams
            in.close();
            out.close();
            socket.close();
        } catch (Exception e) {
            System.out.println("Invalid URl");
            e.printStackTrace();
        }
    }
}

I create a TCP socket and pass the IP address that I receive from InetAddress.getHostAddress() and port 80 for the web server, and use getPath() and getHost() to separate the path and hostname from the URL, and use the same HTTP GET請求中的路徑和主機名。

來自服務器的響應:

Enter URL: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    Loading contents of Server: stackoverflow.com
    151.101.65.69 is IP Adress for  stackoverflow.com
    Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
    HTTP/1.1 301 Moved Permanently
    cache-control: no-cache, no-store, must-revalidate
    location: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
    feature-policy: microphone 'none'; speaker 'none'
    content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
    Accept-Ranges: bytes
    Transfer-Encoding: chunked
    Date: Mon, 27 Dec 2021 15:00:17 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Served-By: cache-qpg1263-QPG
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1640617217.166650,VS0,VE338
    Vary: Fastly-SSL
    X-DNS-Prefetch-Control: off
    Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
    
    0

我的要求是獲取此網頁的 HTML 源代碼,以及HTTP/1.0 200 OK響應。

發生這種情況是因為您使用的是帶有硬編碼端口的普通Socket 80 This means that, independently of using a http or https url in your input, you are requesting via the unsecure protocol http .

在這種情況下,服務器會告訴您,正如 Samuel L. Jackson 會說“嘿,mf,您正試圖通過 f 不安全協議與我聯系。HTTP,使用安全的 mf. f HTTPS。 ”等等, it responds with 301 (which just means "use this url, not the original one"), with the Location header pointing to the correct URL, the https one.

所以顯然301 Location是相同的 URL,但事實並非如此,因為在您的代碼中,您正在硬編碼http ,並且服務器響應重定向到https

要使您的代碼與https一起使用,而不是普通的Socket ,請使用以下命令:

SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);

請注意,我沒有使用ip ,因為對於https你需要證書對應於域,如果你使用 IP 你會得到一個CertificateExpiredException

現在,是使用Socket還是SSLSocket是您必須根據用戶輸入以編程方式管理的東西。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM