简体   繁体   中英

Python Decoding/Encoding Problems

I know that a lot of people on the Internet have expressed having problems with string encodings in Python but no matter what I try, I can't figure out how to fix my problem. Essentially, I'm using TCP sockets to connect to a Web Server and then I send that Server a HTTP Request. I read the response into a series of buffers that I decode and concatenate to create a complete response as a string. When I get the response however, I'm getting UnicodeDecodingErrors . I want to use my program to go on to many different websites so is there any solution to this problem that would work with just about any site I give it?

Thank you for your time.

Some code:

def getAllFromSocket(socket):
    '''Reads all data from a socket and returns a string of it.'''
    more_bytes = True
    message = ''
    if(socket!=None):
        while(more_bytes):
        buffer = socket.recv(1024)
        if len(buffer) == 0:
            more_bytes = False
        else:
            message += buffer.decode('utf-8')
    return message

So when I do this:

received_message = getAllFromSocket(my_sock)

I get:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 1023: unexpected end of data

You can try finding the encoding of the data using UnicodeDammit . Make sure you're getting utf-8 . You can also choose to ignore errors:

buffer.decode("utf-8", "ignore")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM