简体   繁体   中英

socket.connect() gives socket.gaierror: [Errno 11004] getaddrinfo failed

I have already checked answers regarding my problem, but I couldn't find what's wrong. I am new to Python and that might be a problem. I have written this simple code to connect to a site, but I get this error:

socket.gaierror: [Errno 11004] getaddrinfo failed

This is my code:

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('http://www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
    data = mysock.recv(512)        
    if(len(data) < 1):
        break
    print (data)
mysock.close()
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
    data = mysock.recv(512)        
    if(len(data) < 1):
        break
    print (data)
mysock.close()

Quite simple, don't use http:// in your host declaration on .connect() .
http:// is a protocol and www.py4e.com is a host (or A record in a DNS server). The standard socket library doesn't know anything regarding protocols and there for requires only a host and a port number.
If you want automated processes check out urllib.request or @Mego's answer using Requests which handles the connection and HTTP parsing for you.

Also if you're using Python3 which you probably should, you need to send bytes data when doing .send() .

There's two ways of converting your string to bytes data:

mysock.send(b'GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
mysock.send(bytes('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n', 'UTF-8'))

Both does the same thing basically.

Finally, in a GET request you don't request http:// either.
Instead you just send the path to the file you want to retrieve:

mysock.send(b'GET /code3/mbox-short.txt HTTP/1.0\n\n')

The reason is (again) that http:// is a protocol descriptor and not part of the actual protocol data being sent. You also don't need the host declaration in your GET request because the server that you connected to already knows which host you're on - since you're... connected to it.
Instead the server expects you to supply a Host: <hostname>\\r\\n header if the host is serving multiple virtual hosts.
You might need a few other headers tho to be able to request actual content from certain web-servers.

But this is the basic jist of things.

Continue reading

Here's a good start:

It shows you what a raw GET request looks like.
An in the future I recommend using your browsers built-in Network Debugger which can show raw headers, raw responses and a whole bunch of other things.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM