简体   繁体   English

Python-单独线程上的非空共享列表显示为空

[英]Python - Non-empty shared list on separate thread appears empty

I've two classes - MessageProducer and MessageConsumer. 我有两个类-MessageProducer和MessageConsumer。

MessageConsumer does the following: MessageConsumer执行以下操作:

  1. receives messages and puts them in its message list "_unprocessed_msgs" 接收消息并将其放入消息列表“ _unprocessed_msgs”
  2. on a separate worker thread, moves the messages to internal list "_in_process_msgs" 在单独的工作线程上,将消息移至内部列表“ _in_process_msgs”
  3. on the worker thread, processes messages from "_in_process_msgs" 在工作线程上,处理来自“ _in_process_msgs”的消息

On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero. 在我的开发环境中,我面临上述#2的问题-通过执行步骤#1添加一条消息后,当工作线程检查“ _unprocessed_msgs”的长度时,它将变为零。 When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. 当重复步骤1时,列表会在添加了该项目的线程上正确显示2个项目。 But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero. 但是在步骤2中,在工作线程上,len(_unprocessed_msgs)再次返回零。

Not sure why this is happening. 不知道为什么会这样。 Would really appreciate help any help on this. 非常感谢帮助。

I'm using Ubuntu 16.04 having Python 2.7.12. 我正在使用具有Python 2.7.12的Ubuntu 16.04。

Below is the sample source code. 下面是示例源代码。 Please let me know if more information is required. 如果需要更多信息,请告诉我。

import threading
import time
class MessageConsumerThread(threading.Thread):
    def __init__(self):
        super(MessageConsumerThread, self).__init__()
        self._unprocessed_msg_q = []
        self._in_process_msg_q = []
        self._lock = threading.Lock()
        self._stop_processing = False

    def start_msg_processing_thread(self):
        self._stop_processing = False
        self.start()

    def stop_msg_processing_thread(self):
        self._stop_processing = True

    def receive_msg(self, msg):
        with self._lock:
            LOG.info("Before: MessageConsumerThread::receive_msg: "
                     "len(self._unprocessed_msg_q)=%s" %
                     len(self._unprocessed_msg_q))
            self._unprocessed_msg_q.append(msg)
            LOG.info("After: MessageConsumerThread::receive_msg: "
                     "len(self._unprocessed_msg_q)=%s" %
                     len(self._unprocessed_msg_q))

    def _queue_unprocessed_msgs(self):
        with self._lock:
            LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
                     "len(self._unprocessed_msg_q)=%s" %
                     len(self._unprocessed_msg_q))
            if self._unprocessed_msg_q:
                LOG.info("Moving messages from unprocessed to in_process queue")
                self._in_process_msg_q += self._unprocessed_msg_q
                self._unprocessed_msg_q = []
                LOG.info("Moved messages from unprocessed to in_process queue")

    def run(self):
        while not self._stop_processing:
            # Allow other threads to add messages to message queue
            time.sleep(1)

            # Move unprocessed listeners to in-process listener queue
            self._queue_unprocessed_msgs()

            # If nothing to process continue the loop
            if not self._in_process_msg_q:
                continue

            for msg in self._in_process_msg_q:
                self.consume_message(msg)

            # Clean up processed messages
            del self._in_process_msg_q[:]

    def consume_message(self, msg):
        print(msg)


class MessageProducerThread(threading.Thread):
    def __init__(self, producer_id, msg_receiver):
        super(MessageProducerThread, self).__init__()
        self._producer_id = producer_id
        self._msg_receiver = msg_receiver

    def start_producing_msgs(self):
        self.start()

    def run(self):
        for i in range(1,10):
            msg = "From: %s; Message:%s" %(self._producer_id, i)
            self._msg_receiver.receive_msg(msg)


def main():
    msg_receiver_thread = MessageConsumerThread()
    msg_receiver_thread.start_msg_processing_thread()

    msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
                                                msg_receiver=msg_receiver_thread)
    msg_producer_thread.start_producing_msgs()
    msg_producer_thread.join()
    msg_receiver_thread.stop_msg_processing_thread()
    msg_receiver_thread.join()

if __name__ == '__main__':
    main()

Following is the log the I get: 以下是我得到的日志:

INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**

This is not a good desing for you application. 对于您的应用程序而言,这不是一个好的设计。 I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure. 我花了一些时间来调试它-但是线程代码自然很复杂,因此我们应该尽量简化它,而不是让它更加混乱。

When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. 当我看到Python中的线程代码时,通常会看到它以程序形式编写:一个普通函数,该函数作为驱动每个线程的target参数传递给threading.Thread That way, you don't need to write code for a new class that will have a single instance. 这样,您无需为将具有单个实例的新类编写代码。

Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. 另一件事是,尽管Python的全局解释器锁本身保证了如果在两个单独的线程中进行修改,列表也不会损坏,但是列表并不是推荐的“线程数据传递”数据结构。 You probably should look at threading.Queue to do that 您可能应该看一下threading.Queue来做到这一点

The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. 乍一看,这段代码中的错误可能不是由于使用锁而导致问题的原因,而可能是原因。 Instead of 代替

self._unprocessed_msg_q = []

which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do: 这将创建一个新的列表对象,另一个线程暂时也没有引用(因此可能会将数据写入旧列表),您应该执行以下操作:

self._unprocessed_msg_q[:]  = []

Or just the del slice thing you do on the other method. 或只是del片件事你做其他的方法。

But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. 但是为了安全起见,并拥有模式可维护且不太令人惊讶的代码,您确实应该在此假设Python线程更改为过程方法。 Assume "Thread" is the "final" object that can do its thing, and then use Queues around: 假设“线程”是可以完成其任务的“最终”对象,然后在周围使用Queues:

# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals

from threading import Thread
try:
    from queue import Queue, Empty
except ImportError:
    from Queue import Queue, Empty
import time
import random


TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()


class Receiver(object):

    def __init__(self, queue):
        self.queue = queue
        self.in_process = []

    def receive_data(self, data):
        self.in_process.append(data)

    def consume_data(self):
        print("received data:",  self.in_process)
        del self.in_process[:]

    def receiver_loop(self):
        queue = self.queue
        while True:
            try:
                data = queue.get(block=False)
            except Empty:
                print("got no data from queue")
                data = NO_DATA_SENTINEL

            if data is TERMINATE_SENTINEL:
                print("Got sentinel: exiting receiver loop")
                break

            self.receive_data(data)

            time.sleep(random.uniform(0, 0.3))
            if queue.empty():
                # Only process data if we have nothing to receive right now:
                self.consume_data()
                print("sleeping receiver")
                time.sleep(1)
        if self.in_process:
            self.consume_data()


def producer_loop(queue):
    for i in range(10):
        time.sleep(random.uniform(0.05, 0.4))
        print("putting {0} in queue".format(i))
        queue.put(i)


def main():
    msg_queue = Queue()
    msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
    time.sleep(0.1)
    msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))

    msg_receiver_thread.start()
    msg_producer_thread.start()
    msg_producer_thread.join()
    msg_queue.put(TERMINATE_SENTINEL)
    msg_receiver_thread.join()

if __name__ == '__main__':
    main()

note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. 请注意,由于您希望接收线程中的多个方法可以处理数据,因此我使用了一个类-但它不继承自Thread,因此不必担心其工作原理。 All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. 它的所有方法都在同一个线程中调用:不需要锁,也不必担心接收器类内部的竞争条件。 For communicating outside the class, the Queue class is structured to handle any race conditions for us. 为了在类外进行交流,Queue类的结构可以为我们处理任何竞争条件。

The producer loop, as it is just a dummy producer, has no need at all to be written in class form. 生产者循环,因为它只是一个虚拟生产者,根本不需要以类形式编写。 But it would look just the same, if it had more methods. 但是,如果它有更多的方法,它将看起来一样。

(The random sleeps help visualize what would happen in "real world" message receiving) Also, you might want to take a look at something like: https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose (随机睡眠有助于可视化“现实世界”消息接收中将发生的情况)此外,您可能希望查看以下内容: https : //www.thoughtworks.com/insights/blog/composition-vs-inheritance-如何选择

Finally I was able to solve the issue. 终于我解决了这个问题。 In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer: 在实际的代码中,我有一个Manager类,负责将MessageConsumerThread实例化为初始化程序中的最后一件事:

class Manager(object):
    def __init__(self):
        ...
        ...
        self._consumer = MessageConsumerThread(self)
        self._consumer.start_msg_processing_thread()

The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). 问题似乎是当Manager仍在执行初始化程序时,在MessageConsumerThread初始化程序中传递了“自我”(尽管这是最后两个步骤)。 The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q". 当我将消费者的创建移出初始化程序的那一刻,消费者线程能够看到“ _unprocessed_msg_q”中的元素。

Please note that the issue is still not reproducible with the above sample code. 请注意,上述示例代码仍然无法重现该问题。 It is manifesting itself in the production environment only. 它仅在生产环境中体现。 Without the above fix, I tried queue and dictionary as well but observed the same issue. 没有上述修复程序,我也尝试了队列和字典,但观察到相同的问题。 After the fix, tried with queue and list and was able to successfully execute the code. 修复后,尝试使用队列和列表,并能够成功执行代码。

I really appreciate and thank @jsbueno and @ivan_pozdeev for their time and help! 我非常感谢@jsbueno和@ivan_pozdeev的时间和帮助! Community @stackoverflow is very helpful! 社区@stackoverflow非常有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM