[英]How to Fetch Webpage Through TCP socket using HTTP Request in JAVA
My college assignment is to fetch a web page from any web server by URL using a TCP socket and HTTP GET
request.
我沒有收到來自任何服務器的HTTP/1.0 200 OK
響應。
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {
public static void main(String[] args) {
Scanner inpt = new Scanner(System.in);
System.out.print("Enter URL: ");
String url = inpt.next();
TCPConnect(url);
}
public static void TCPConnect(String url) {
try {
String hostname = new URL(url).getHost();
System.out.println("Loading contents of Server: " + hostname);
InetAddress ia = InetAddress.getByName(hostname);
String ip = ia.getHostAddress();
System.out.println(ip + " is IP Adress for " + hostname);
String path = new URL(url).getPath();
System.out.println("Requested Path on the server: " + path);
Socket socket = new Socket(ip, 80);
// Create input and output streams to read from and write to the server
PrintStream out = new PrintStream(socket.getOutputStream());
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
// Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
if (hostname ! = url) {
//Request Line
out.println("GET " + path + " HTTP/1.1");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
} else {
//Request Line
out.println("GET / HTTP/1.0");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
}
// Read data from the server until we finish reading the document
String line = in.readLine();
while (line != null) {
System.out.println(line);
line = in.readLine();
}
// Close our streams
in.close();
out.close();
socket.close();
} catch (Exception e) {
System.out.println("Invalid URl");
e.printStackTrace();
}
}
}
I create a TCP socket and pass the IP address that I receive from InetAddress.getHostAddress()
and port 80
for the web server, and use getPath()
and getHost()
to separate the path and hostname from the URL, and use the same HTTP GET
請求中的路徑和主機名。
來自服務器的響應:
Enter URL: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
Loading contents of Server: stackoverflow.com
151.101.65.69 is IP Adress for stackoverflow.com
Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
HTTP/1.1 301 Moved Permanently
cache-control: no-cache, no-store, must-revalidate
location: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
feature-policy: microphone 'none'; speaker 'none'
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
Accept-Ranges: bytes
Transfer-Encoding: chunked
Date: Mon, 27 Dec 2021 15:00:17 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-qpg1263-QPG
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1640617217.166650,VS0,VE338
Vary: Fastly-SSL
X-DNS-Prefetch-Control: off
Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
0
我的要求是獲取此網頁的 HTML 源代碼,以及HTTP/1.0 200 OK
響應。
發生這種情況是因為您使用的是帶有硬編碼端口的普通Socket
80
。 This means that, independently of using a http
or https
url in your input, you are requesting via the unsecure protocol http
.
在這種情況下,服務器會告訴您,正如 Samuel L. Jackson 會說“嘿,mf,您正試圖通過 f 不安全協議與我聯系。HTTP,使用安全的 mf. f HTTPS。 ”等等, it responds with 301 (which just means "use this url, not the original one"), with the Location
header pointing to the correct URL, the https
one.
所以顯然301
Location
是相同的 URL,但事實並非如此,因為在您的代碼中,您正在硬編碼http
,並且服務器響應重定向到https
。
要使您的代碼與https
一起使用,而不是普通的Socket
,請使用以下命令:
SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);
請注意,我沒有使用ip
,因為對於https
你需要證書對應於域,如果你使用 IP 你會得到一個CertificateExpiredException
現在,是使用Socket
還是SSLSocket
是您必須根據用戶輸入以編程方式管理的東西。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.