I know that a lot of people on the Internet have expressed having problems with string encodings in Python but no matter what I try, I can't figure out how to fix my problem. Essentially, I'm using TCP sockets to connect to a Web Server and then I send that Server a HTTP Request. I read the response into a series of buffers that I decode and concatenate to create a complete response as a string. When I get the response however, I'm getting UnicodeDecodingErrors . I want to use my program to go on to many different websites so is there any solution to this problem that would work with just about any site I give it?
Thank you for your time.
Some code:
def getAllFromSocket(socket):
'''Reads all data from a socket and returns a string of it.'''
more_bytes = True
message = ''
if(socket!=None):
while(more_bytes):
buffer = socket.recv(1024)
if len(buffer) == 0:
more_bytes = False
else:
message += buffer.decode('utf-8')
return message
So when I do this:
received_message = getAllFromSocket(my_sock)
I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 1023: unexpected end of data
You can try finding the encoding of the data using UnicodeDammit . Make sure you're getting utf-8
. You can also choose to ignore errors:
buffer.decode("utf-8", "ignore")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.