在单独的线程中将数据写入磁盘（并行）

Question

I would like to start a function multiple times in a loop that each time acquire and image from a camera and writes the image to disc without the loop waiting for this process to finish.我想在一个循环中多次启动 function，每次从相机获取图像并将图像写入光盘，而无需循环等待此过程完成。 So every time this function is called it runs in parallel with the loop that started the function such that I can continue doing other time sensitive stuff in the meantime.所以每次调用这个 function 时，它都会与启动 function 的循环并行运行，这样我就可以同时继续做其他时间敏感的事情。

I have made this example which make the first "execution" of the function run in parallel with the loop and then fails the second time as I cannot.start() it twice.我已经制作了这个示例，它使 function 的第一次“执行”与循环并行运行，然后第二次失败，因为我 can't.start() 它两次。 Can this be achieved by other means?这可以通过其他方式实现吗？

Example (original post - updated below)示例（原始帖子 - 更新如下）

import numpy as np
import threading
import time

def imacq():
    print('acquiring image...')
    time.sleep(1.8)
    print('saved image...')
    return

# Start image acqusition and writing to disc thread
imacq_thread = threading.Thread(target=imacq)

starttime = time.time()
sig_arr = np.zeros(100)
tim_arr = np.zeros(100)
image_cycles = 5
running = True
flag = True
for cycles in range(1,20):
    print(cycles)
    if cycles%image_cycles == 0:
        if flag is True:
            imacq_thread.start() # this works well the first time as intended
            # imacq() # this does not work as everything is paused until imacp() returns
            flag = False
    else:
        flag = True
    time.sleep(0.4)

EDIT: After feedback from Sylvaus: I have made two different versions for triggering a function that eventually will be used to acquire and store and image on the drive in parallel with a main script that decides the time to send a trigger/execute the function.编辑：在得到 Sylvaus 的反馈后：我制作了两个不同的版本来触发 function，最终将用于在驱动器上获取和存储图像，并与决定发送触发器/执行 function 的时间的主脚本并行。 One version is based on Sylvaus' answer (threading) and another is based on multiprocessing.一个版本基于 Sylvaus 的答案（线程），另一个版本基于多处理。

Example based on Sylvaus's answer (Threading):基于 Sylvaus 的回答（线程）的示例：

import matplotlib.pyplot as plt
import numpy as np
import time
from concurrent.futures import ThreadPoolExecutor


def imacq():
    print('taking image')
    n = 10000
    np.ones((n, n))*np.ones((n, n))  # calculations taking time
    print('saving image')
    return


sig_arr = np.zeros(100)
tim_arr = np.zeros(100)
image_cycles = 20
max_cycles = 100
freq = 10
cycles = 1
sigSign = 1

running = True
flag = True
timeinc = []
tic = time.time()
tic2 = tic
timeinc = np.zeros(max_cycles)
starttime = time.time()
with ThreadPoolExecutor() as executor:
    while running:
        t = time.time()-starttime
        tim_arr[:-1] = tim_arr[1:]
        tim_arr[-1] = t
        signal = np.sin(freq*t*(2.0*np.pi))
        sig_arr[:-1] = sig_arr[1:]
        sig_arr[-1] = signal

        time.sleep(0.00001)
        # Calculate cycle number
        sigSignOld = sigSign
        sigSign = np.sign(sig_arr[-1]-sig_arr[-2])
        if sigSign == 1 and sigSignOld != sigSign:
            timeinc[cycles] = time.time()-tic
            cycles += 1
            print('cycles: ', cycles, ' time inc.: ', str(timeinc[cycles-1]))
            tic = time.time()

        if cycles%image_cycles == 0:
            if flag is True:
                # The function is submitted and will be processed by a
                # a thread as soon as one is available
                executor.submit(imacq)
                flag = False
        else:
            flag = True
        if cycles >= max_cycles:
            running = False

print('total time: ', time.time()-tic2)

fig = plt.figure()
ax = plt.axes()
plt.plot(timeinc)

Example based on multiprocessing:基于多处理的示例：

import matplotlib.pyplot as plt
import numpy as np
import time
from multiprocessing import Process, Value, Lock


def trig_resp(running, trigger, p_count, pt, lock):
    while running.value == 1:  # note ".value" on each sharedctype variable
        time.sleep(0.0001)  # sleeping in order not to load CPU too excessively
        if trigger.value == 1:
            with lock:  # lock "global" variable before wrtting to it
                trigger.value = 0  # reset trigger
            tic = time.time()
            # Do a calculation that takes a significant time
            n = 10000; np.ones((n, n))*np.ones((n, n))
            with lock:
                pt.value = time.time() - tic  # calculate process time
                p_count.value += 1  # count number of finished processes
    return


if __name__ == "__main__":
    # initialize shared values (global accross processes/sharedctype).
    # Type 'i': integer, type 'd': double.
    trigger = Value('i', 0)  # used to trigger execution placed in trig_resp()
    running = Value('i', 1)  # A way to break the loop in trig_resp()
    p_count = Value('i', 0)  # process counter and flag that process is done
    pt = Value('d', 0.0)  # process time of latest finished process
    lock = Lock() # lock object used to avoid raise conditions when changing "global" values.
    p_count_old = p_count.value
    p1 = Process(target=trig_resp, args=(running, trigger, p_count, pt, lock))
    p1.start()  # Start process

    # A "simulated" sinusiodal signal
    array_len = 50
    sig_arr = np.zeros(array_len)  # Signal array
    tim_arr = np.zeros(array_len)  # Correpsonding time array
    freq = 10  # frequency of signal

    # trigger settings
    im_int = 20  # cycle interval for triggering (acquiring images)
    max_cycles = 100  # max number of cycles before stopping main

    # initializing counters etc.
    cycles = 1  # number of cycles counted
    sigSign = 1  # sign of signal gradient
    flag = 1  # used to only set trigger once for the current cycle count
    trigger_count = 0  # counts how many times a trigger has been set

    tic = time.time()
    tic2 = tic
    timeinc = np.zeros(max_cycles) # Array to keep track of time used for each main loop run
    starttime = time.time()
    while running.value == 1:
        time.sleep(0.00001)  # mimics sample time (real world signal)
        t = time.time()-starttime  # local time
        signal = np.sin(freq*t*(2.0*np.pi))  # simulated signal
        # Keeping the latest array_len values (FIFO) of t and signal.
        tim_arr[:-1] = tim_arr[1:]
        tim_arr[-1] = t
        sig_arr[:-1] = sig_arr[1:]
        sig_arr[-1] = signal

        if p_count.value == p_count_old + 1:  # process have finished
            print('Process counter: ', p_count.value,  'process_time: ', pt.value)
            p_count_old = p_count.value

        # Calculate cycle number by monotoring sign of the gradient
        sigSignOld = sigSign  # Keeping track of previous signal gradient sign
        sigSign = np.sign(sig_arr[-1]-sig_arr[-2])  # current gradient sign
        if sigSign == 1 and sigSignOld == -1:  # a local minimum just happened
            timeinc[cycles] = time.time()-tic
            cycles += 1
            print('cycles: ', cycles, ' time inc.: ', str(timeinc[cycles-1]))
            tic = time.time()
            flag = 1

        if cycles % im_int == 0 and flag == 1:
            if cycles > 0:
                if trigger_count > p_count.value:
                    print('WARNING: Process: ', p_count.value,
                          'did not finish yet. Reduce freq or increase im_int')
                trigger.value = 1
                trigger_count += 1
                print('Trigger number: ', trigger_count)
                flag = 0

        if cycles >= max_cycles:
            running.value = 0

    print('total cycle time: ', time.time()-tic2)

    # Print the process time of the last run
    if p_count.value < max_cycles//im_int:
        if p_count.value == p_count_old + 1:
            print('process counter: ', p_count.value,  'process_time: ', pt.value)
            p_count_old = p_count.value

    print('total process time: ', time.time()-tic2)

    fig = plt.figure()
    ax = plt.axes()
    plt.plot(timeinc)

I am on a windows 10 laptop so the timing (time increment in each loop of the main while loop "while running...:") is dependent on what else is happening on my computer, but the version based on multiprocessing seems less sensitive to this than the one based on threading.我使用的是 windows 10 笔记本电脑，因此时间（主while循环“运行时...：”的每个循环中的时间增量）取决于我的计算机上发生的其他事情，但基于多处理的版本似乎不太敏感这比基于线程的。 However the one based on multiprocessing is not very elegant and I am suspecting that a smarter solution is possible (simpler and less easy to make a mistake) that can achieve the same or better (consistent time increments with lower load on the CPU).然而，基于多处理的解决方案不是很优雅，我怀疑可能有一个更智能的解决方案（更简单，更不容易出错），它可以实现相同或更好的效果（一致的时间增量和较低的 CPU 负载）。

I have attached graphs of the time increments I get here for the Multiprocess and Threading example, respectively here:我在此处分别附上了多进程和线程示例的时间增量图：

Any feedback on improving the two solutions is much appreciated.非常感谢有关改进这两种解决方案的任何反馈。

Answer 1

You can use an Executor .This way, you can simply submit your tasks and they will be processed based on the type of Executor you are using.您可以使用Executor 。这样，您可以简单地提交您的任务，它们将根据您使用的 Executor 类型进行处理。

I don't know what is in your imacq , so you may have to try ThreadPoolExecutor and ProcessPoolExecutor to find which one is the most fitting for your application.我不知道您的imacq中有什么，因此您可能需要尝试ThreadPoolExecutor和ProcessPoolExecutor来找到最适合您的应用程序的一个。

Example:例子：

import numpy as np
import time
from concurrent.futures import ThreadPoolExecutor

def imacq():
    print('acquiring image...')
    time.sleep(1.8)
    print('saved image...')
    return

starttime = time.time()
sig_arr = np.zeros(100)
tim_arr = np.zeros(100)
image_cycles = 5
running = True
flag = True

with ThreadPoolExecutor() as executor:
    for cycles in range(1,20):
        print(cycles)
        if cycles%image_cycles == 0:
            if flag is True:
                # The function is submitted and will be processed by a 
                # a thread as soon as one is available
                executor.submit(imacq)
                flag = False
        else:
            flag = True
        time.sleep(0.4)

Answer 2

The details of your acquisition devices, data rates and volumes don't seem to be very clear but I get the impression that the issue is that you want to acquire one signal as fast as possible and want to get an image captured and written to disk as soon as possible whenever that signal is "interesting" but without delaying the next acquisition of the signal.您的采集设备、数据速率和容量的详细信息似乎不是很清楚，但我的印象是，问题在于您希望尽可能快地采集一个信号，并希望捕获图像并将其写入磁盘只要该信号“有趣”但不延迟信号的下一次采集，就尽快。

So, it seems there is minimal data exchange necessary between the main signal acquisition process and the image capture process.因此，在主信号采集过程和图像捕获过程之间似乎需要进行最少的数据交换。 IMHO, that suggests multiprocessing (therefore no GIL) and use of a queue (no large volumes of data to pickle) to communicate between the two processes.恕我直言，这表明多处理（因此没有 GIL）和使用队列（没有大量数据要腌制）在两个进程之间进行通信。

So, I would be looking at this type of setup:所以，我会看这种类型的设置：

#!/usr/bin/env python3

from multiprocessing import Process, Queue, freeze_support

def ImageCapture(queue):
    while True:
        # Wait till told to capture image - message could contain event reference number
        item = queue.get()
        if item == -1:
           break
        # Capture image and save to disk

def main():
    # Create queue to send image capture requests on
    queue = Queue(8)

    # Start image acquisition process
    p = Process(target=ImageCapture, args=(queue,))
    p.start()

    # do forever
    #    acquire from DAQ
    #    if interesting
    #       queue.put(event reference number or filename)

    # Stop image acquisition process
    queue.put(-1)
    p.join()

if __name__ == "__main__":

    # Some Windows thing
    freeze_support()
    main()

If the ImageCapture() process can't keep up, start two or more.如果ImageCapture()进程跟不上，启动两个或更多。

On my Mac, I measured a mean message delivery time on a queue of 32 microseconds, and a maximum latency of 120 microseconds over 1 million messages.在我的 Mac 上，我测量了 32 微秒队列的平均消息传递时间，以及 100 万条消息的最大延迟为 120 微秒。

在单独的线程中将数据写入磁盘（并行）

问题描述

2 个解决方案

解决方案1
0 2020-05-29 20:41:05

解决方案2
0 已采纳 2020-06-06 15:01:17

在单独的线程中将数据写入磁盘（并行）

问题描述

2 个解决方案

解决方案1 0 2020-05-29 20:41:05

解决方案2 0 已采纳 2020-06-06 15:01:17

解决方案1
0 2020-05-29 20:41:05

解决方案2
0 已采纳 2020-06-06 15:01:17