简体   繁体   English

如何以线程安全的方式非常快速地更新列表列表? -蟒蛇

[英]How can I update a list of lists very quickly in a thread-safe manner? - python

I am writing a script to add a "column" to a Python list of lists at 500 Hz. 我正在编写一个脚本,将“列”添加到500 Hz的Python列表中。 Here is the code that generates test data and passes it through a separate thread: 这是生成测试数据并将其通过单独的线程传递的代码:

# fileA
import random, time, threading
data = [[] for _ in range(4)]  # list with 4 empty lists (4 rows)
column = [random.random() for _ in data]  # synthetic column of data
def synthesize_data():
    while True:
        for x,y in zip(data,column):
            x.append(y)
        time.sleep(0.002)  # equivalent to 500 Hz
t1 = threading.Thread(target=synthesize_data).start()
# example of data
# [[0.61523098235, 0.61523098235, 0.61523098235, ... ],
# [0.15090349809, 0.15090349809, 0.15090349809, ... ],
# [0.92149878571, 0.92149878571, 0.92149878571, ... ],
# [0.41340918409, 0.41340918409, 0.41340918409, ... ]]

# fileB (in Jupyter Notebook)
[1] import fileA, copy

[2] # get a copy of the data at this instant.
    data = copy.deepcopy(fileA.data)
    for row in data:
        print len(row)

If you run cell [2] in fileB, you should see that the lengths of the "rows" in data are not equal. 如果在fileB中运行单元格[2],则应看到data中“行”的长度不相等。 Here is example output when I run the script: 这是我运行脚本时的示例输出:

8784
8786
8787
8787

I thought I might be grabbing the data in the middle of the for loop, but that would suggest that the lengths would be off by 1 at the most. 我以为我可能会在for循环的中间获取数据,但这表明长度最多会减少1。 The differences get more severe over time. 随着时间的流逝,差异变得越来越严重。 My question: why is quickly adding columns to a list of lists unstable? 我的问题:为什么快速将列添加到列表列表不稳定? Is it possible to make this process for stable? 是否可以使此过程稳定?

You might suggest I use something like Pandas, but I want to use Python lists because of their speed advantage (the code needs to be as fast as possible). 您可能会建议我使用Pandas之类的东西,但是由于它们的速度优势(代码需要尽可能快),我想使用Python列表。 I tested the for loop, map() function, and Pandas data frame. 我测试了for循环, map()函数和Pandas数据框。 Here is my test code (in Jupyter Notebook): 这是我的测试代码(在Jupyter Notebook中):

# Setup code
import pandas as pd
import random
channels = ['C3','C4','C5','C2']
a = [[] for _ in channels]
b = [random.random() for _ in a]
def add_col((x,y)):
    x.append(y);
df = pd.DataFrame(index=channels)
b_pandas = pd.Series(b, index=df.index)

%timeit for x,y in zip(a,b): x.append(y)  # 1000000 loops, best of 3: 1.32 µs per loop
%timeit map(add_col, zip(a,b))  # 1000000 loops, best of 3: 1.96 µs per loop
%timeit df[0] = b  # 10000 loops, best of 3: 82.8 µs per loop
%timeit df[0] = b_pandas  # 10000 loops, best of 3: 58.4 µs per loop

You might also suggest that I append the samples to data as rows and then transpose when it's time to analyze. 您可能还会建议我将样本作为行追加到data中,然后在需要分析时进行转置。 I would rather not do that also in the interest of speed. 我也不想为了速度而这样做。 This code will be used in a brain-computer interface, where analysis happens in a loop. 此代码将在脑机接口中使用,在该接口中分析是循环进行的。 Transposing would also have to happen in the loop, and this would get slow as the data grows. 换位还必须在循环中发生,并且随着数据的增长,它会变慢。

edit: changed title from "Why is quickly adding columns to a list of lists unstable - python" to "How can I update a list of lists very quickly in a thread-safe manner? - python" to make the title more descriptive of the post and the answer. 编辑:将标题从“为什么要快速将列快速添加到不稳定的列表列表-python”改为“如何以线程安全的方式非常快地更新列表列表?-python”,以使标题更具描述性发布和答案。

The deepcopy() operation is copying lists as they are modified by another thread , and each copy operation takes a small amount of time (longer as the lists grow larger). deepcopy()操作将复制列表, 因为它们是由另一个线程修改的 ,并且每个复制操作都花费少量时间(随着列表的增大,时间会更长)。 So between copying the first of the 4 lists and copying the second, the other thread added 2 elements, indicating that copying a list of 8784 elements takes between 0.002 and 0.004 seconds. 因此,在复制4个列表中的第一个与复制第二个列表之间,另一个线程添加了2个元素,这表明复制8784个元素的列表需要0.002到0.004秒。

That's because there is nothing preventing threading to switch between executing synthesize_data() and the deepcopy.copy() call. 那是因为没有什么可以阻止线程在执行synthesize_data()deepcopy.copy()调用之间切换。 In other words, your code is simply not thread-safe. 换句话说,您的代码根本不是线程安全的。

You'd have to coordinate between your two threads; 您必须在两个线程之间进行协调; using a lock for example: 例如使用锁:

In fileA : fileA

# ...
datalock = threading.RLock()
# ...

def synthesize_data():
    while True:
        with datalock:
            for x,y in zip(data,column):
                x.append(y)
            time.sleep(0.002)  # equivalent to 500 Hz

and in fileB : 并在fileB

with fileA.datalock:
    data = copy.deepcopy(fileA.data)
    for row in data:
        print len(row)

This ensures that copying only takes place when the thread in fileA is not trying to add more to the lists. 这样可以确保仅在fileA的线程未尝试向列表中添加更多内容时才进行复制。

Using locking will slow down your operations; 使用锁定会减慢您的操作; I suspect the pandas assignment operations are already subject to locks to keep them thread-safe . 我怀疑大熊猫的赋值操作已经受到锁定以保持线程安全

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM