Python Socket utf-8 解码

Question

I want to get the response of a GET request into a string, and i came up with the following code:我想将 GET 请求的响应转换为字符串，我想出了以下代码：

 import socket target_host = "www.google.com" target_port = 80 # create a socket object s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # connect the client s.connect((target_host, target_port)) # send some data request = "GET / HTTP/1.1\\r\\nHost:%s\\r\\n\\r\\n" % target_host s.send(request.encode("utf-8")) full_msg = "" # Prevent recv() function to stop the script to wait until it receives more data, even if there is no more. s.settimeout(1) flag = True while flag: # receive some data try: response = s.recv(4096) full_msg = full_msg + str(response) print("Adding msg") except Exception as e: print(full_msg) flag = False print(e) print("Loop ended") print(type(full_msg))

The thing is, when i try do decode the response in s.recv(4096) replacing it with this code: 问题是，当我尝试对 s.recv(4096) 中的响应进行解码时，将其替换为以下代码：

 response = s.recv(4096).decode("utf-8")

I get the following exception:我收到以下异常：

 'utf-8' codec can't decode byte 0xe8 in position 1025: invalid continuation byte

I don't know how to fix this as i can't modify the characters i get from the response and i need to delete the " b' " character which messes up my full_msg string if i don't decode it each time in the loop.我不知道如何解决这个问题，因为我无法修改我从响应中得到的字符，我需要删除“b'”字符，如果我每次都没有解码它会弄乱我的 full_msg 字符串环形。

Also, in the docs it says that the .recv() method returns a string, but i seem to be getting a byte-like object. 此外，在文档中它说 .recv() 方法返回一个字符串，但我似乎得到了一个类似字节的对象。 Any idea is welcomed and i am also open to know any ways my code could be improved. 欢迎任何想法，我也很乐意知道我的代码可以改进的任何方式。

Answer 1

Per the data in your response, Content-Type: text/html; charset=ISO-8859-1根据您响应中的数据， Content-Type: text/html; charset=ISO-8859-1 Content-Type: text/html; charset=ISO-8859-1 . Content-Type: text/html; charset=ISO-8859-1 。 The response is not UTF-8.响应不是 UTF-8。 .decode('iso-8859-1') instead. .decode('iso-8859-1')代替。 Better yet use requests :更好的是使用requests ：

import requests
r = requests.get('https://www.google.com')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
# print(r.text)    # already decoded with r.encoding
# print(r.content) # byte string of original encoded data
# print(r.json())  # If the response is JSON data, This loads the data.

Python Socket utf-8 解码

问题描述

1 个解决方案

解决方案1
0 2020-03-04 01:47:05

Python Socket utf-8 解码

问题描述

1 个解决方案

解决方案1 0 2020-03-04 01:47:05

解决方案1
0 2020-03-04 01:47:05