简体   繁体   中英

Python - Send HTTP GET string - Receive 301 Moved Permanently - What's next?

I'm trying to use Python 2 to send my own HTTP GET message to a web server, retrieve html text, and write it to an html file (no urllib , urllib2 , httplib , requests , etc. allowed).

import socket 
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('python.org', 80))

http_get = """GET / HTTP/1.1\r
Host: www.python.org/\r
Connection: keep-alive\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\r
Accept-Encoding: gzip, deflate, sdch\r
Accept-Language: en-US,en;q=0.8\r\n\r\n"""

tcpSocket.send(http_get)
m = tcpSocket.recv(4096)
tcpSocket.close()

print m

Output:

HTTP/1.1 301 Moved Permanently
Location: https://www.python.org//
Connection: Keep-Alive
Content-length: 0

Why does it return 301 when the location is apparently still the same? What message and to where should I send next to get the html content?

Thank you very much!

Your problem is that the url you are seeking doesn't serve over http:// , but rather redirects to https:// . To show that your code fundamentally works with a proper target I have changed your get request to

import socket
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('www.cnn.com', 80))

http_get = """GET / HTTP/1.1\r
Host: www.cnn.com/\r
Connection: keep-alive\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\r
Accept-Encoding: gzip, deflate, sdch\r
Accept-Language: en-US,en;q=0.8\r\n\r\n"""

http_get_minimum = """GET / HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\n\r\n"""

tcpSocket.send(http_get_minimum)
m = tcpSocket.recv(4096)
tcpSocket.close()

and received

HTTP/1.1 200 OK x-servedByHost: prd-10-60-168-42.nodes.56m.dmtio.net Cache-Control: max-age=60 X-XSS-Protection: 1; mode=block Content-Security-Policy: default-src 'self' http:// .cnn.com: https:// .cnn.com: .cnn.net: .turner.com: .ugdturner.com: .vgtf.net: ; script-src 'unsafe-inline' 'unsafe-eval' 'self' *; style-src 'unsafe-inline' 'self' *; frame-src 'self' *; object-src 'self' *; img-src 'self' * data:; media-src 'self' *; font-src 'self' *; connect-src 'self' *; Content-Type: text/html; charset=utf-8 Via: 1.1 varnish Content-Length: 74864 Accept-Ranges: bytes Date: Mon, 05 Oct 2015 00:39:54 GMT Via: 1.1 varnish Age: 170 Connection: close X-Served-By: cache-iad2144-IAD, cache-sjc3129-SJC X-Cache: HIT, HIT X-Cache-Hits: 2, 95 X-Timer: S1444005594.675567,VS0,VE0 Vary: Accept-Encoding

UPDATE: Yes, there is extra functionality required from what you have presented to be able to request over HTTPS . There are some primary differences between http and https, however, beginning with the default port, which is 80 for http and 443 for https. Https works by transmitting normal http interactions through an encrypted system, so that in theory, the information cannot be accessed by any party other than the client and end server. There are two common types of encryption layers: Transport Layer Security (TLS) and Secure Sockets Layer (SSL), both of which encode the data records being exchanged.

When using an https connection, the server responds to the initial connection by offering a list of encryption methods it supports. In response, the client selects a connection method, and the client and server exchange certificates to authenticate their identities. After this is done, both parties exchange the encrypted information after ensuring that both are using the same key, and the connection is closed. In order to host https connections, a server must have a public key certificate, which embeds key information with a verification of the key owner's identity. Most certificates are verified by a third party so that clients are assured that the key is secure.

我有同样的问题,改变端口从80到443解决了它。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM