简体   繁体   English

将time.sleep添加到多线程程序可以解决python中的UnicodeDecodeError问题

[英]Adding a time.sleep to a multithreaded program solves a UnicodeDecodeError in python

Here's a basic idea of the threads that I am creating in my program: 这是我在程序中创建的线程的基本概念:

 Main thread
        |
   ListenerCreator(The WebSocketServer thread)  ---> Several listener threads(using log())

So the main thread creates a ListenerCreator thread, which connects to a number of clients and creates a listener thread for each client. 因此,主线程创建一个ListenerCreator线程,该线程连接到许多客户端并为每个客户端创建一个侦听器线程。 Here's briefly what a listener thread does: EDIT1 : I'm using WebSockets to read/write data off my client. 这里简要介绍一下监听器线程的作用:EDIT1:我正在使用WebSockets从客户端读取/写入数据。 I've made my own server for this purpose. 我为此目的制作了自己的服务器。 There is a framing protocol which the standard specifies -- and I am using that. 标准规定了一个框架协议 - 我正在使用它。 On the client side I am simply using WebSocket.send() and "unmasking" the messages according to the instructions given in the protocol(see section 5.3 in the link above). 在客户端,我只是使用WebSocket.send()并根据协议中给出的指令“取消屏蔽”消息(参见上面链接中的第5.3节)。 I would be willing to provide the server code if someone requests it, however, here's a brief outline: 如果有人请求,我愿意提供服务器代码,但是,这里有一个简要的概述:

class WebSocketServer:
    def start(): 
          #Open server socket, bind to host:port
          while True:
              #Accept client socket, start a new listener thread for self.log(client)
    def log(client):
          #Receive data using socket.socket.recv(1024)
          #Unmask data as per the protocol
          #Decode using data.decode("utf-8")
          #Append to data_q while holding data_q_lock

There are other methods - those to facilitate sending, closing, handshaking and so on. 还有其他方法 - 便于发送,关闭,握手等方法。

Meanwhile in the main thread: 同时在主线程中:

   while breaking!=len(client_list):
        #time.sleep(0.5)    
        with data_q_lock:
           for i in range(len(data_q)):
                mes = data_q.pop()
                for m in client_list:
                    if "#DONE"== mes:
                        breaking += 1
                if(mes[:len("#COUNT:")] == "#COUNT:"):
                    print(mes)

So basically what this loop does is: Loop thru the data_q, if the message starts with "#COUNT", print the message, and after getting a certain number of "#DONE" messages, exit the loop. 所以这个循环基本上是这样做的:循环通过data_q,如果消息以“#COUNT”开头,打印消息,并在获得一定数量的“#DONE”消息后退出循环。 If the time.sleep is uncommented, then this code works, however without time.sleep I get an UnicodeDecodeError in the log function. 如果time.sleep被取消注释,那么这段代码可以工作,但是没有time.sleep我在日志函数中得到了一个UnicodeDecodeError。 Also I only get the error sometimes , sometimes the program works perfectly. 此外我有时只会得到错误,有时程序运行完美。 (The client is sending the same data every time, by the way) So, my question is, why is the time.sleep required? (顺便说一下,客户端每次都发送相同的数据)所以,我的问题是,为什么需要time.sleep? I thought it was something to do with the GIL in python, as time.sleep releases the GIL. 我认为这与python中的GIL有关,因为time.sleep发布了GIL。 However, even after reading about it I couldn't solve the question 然而,即使在阅读之后我也无法解决这个问题

Currently there is no information about how the listener is reading data off the socket. 目前没有关于监听器如何从套接字读取数据的信息。 It seems likely however that this is being caused by the usual misunderstanding of sockets. 然而,这似乎是由于通常对套接字的误解造成的。

Data sent down a socket is not "framed" in any way by the socket. 从套接字向下发送的数据不会以任何方式被“框架化”。 Imagine if I sent the message "hello" three times down a socket. 想象一下,如果我在套接字上发送了三次“hello”消息。 Then, like writing to a file without line breaks, the following would flow on the socket: 然后,就像写入没有换行符的文件一样,以下内容会在套接字上流动:

hellohellohello

Now consider the reader ... when reading the data, how does it know where one message ("hello") starts and and the next? 现在考虑一下读者...在读取数据时,它如何知道一条消息(“你好”)的开始位置和下一条消息? It cannot, unless the sender and receiver agree about how that data should be "framed". 它不能,除非发送方和接收方就如何“框架”数据达成一致。 This could be done by agreeing on some protocol like: 这可以通过商定某些协议来完成,例如:

  • null-terminating data; 空终止数据; or 要么
  • fixed size messages; 固定大小的消息; or 要么
  • size prefixed messages. 大小带有前缀的消息。

It gets more complicated of course, even once you've decided how the data should be framed, you cannot guarantee that socket.recv will return a "whole" message ... it will simply return whatever data happens to be in the buffer at the time. 当然,它变得更复杂,即使你已经决定了数据应该如何构建,你也不能保证socket.recv会返回一个“整个”消息......它只会返回缓冲区中的任何数据。时间。 It may be a half a message, or a message and a half. 它可能是消息的一半,也可能是消息的一半。 Its your job to collate the data read from the socket and divide it into messages. 您的工作是整理从套接字读取的数据并将其划分为消息。

Turning to your problem, where you are sending utf-8 data. 转到您的问题,您要发送utf-8数据。 How does the reader know it has read a full utf-8 data message? 读者如何知道它已经读取了完整的utf-8数据消息? Most likely, what is happening here is that you have only received a partial message ... there is still more to arrive. 最有可能的是,这里发生的事情是你只收到了部分信息 ......还有更多信息要到。

In particular, a valid utf-8 character may consist of more than one byte. 特别是,有效的utf-8字符可能包含多个字节。 So if your partial message ends in the middle of a multi-byte utf-8 representation of a character, then you can certainly not decode it. 因此,如果您的部分消息在字符的多字节utf-8表示的中间结束,那么您当然不能解码它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM