简体   繁体   English

在python中使用多线程进行I / O减速

[英]I/O slowdown with multithreading in python

I have a python script, which works on the following scheme: read a large file (eg, movie) - compose selected information from it into a number of small temporary files - spawn in subprocesses a C++ application to perform the files processing/calculations (separately for each file) - read the application output. 我有一个python脚本,它适用于以下方案:读取一个大文件(例如,电影) - 将选定的信息组合成一些小的临时文件 - 在子进程中产生一个C++应用程序来执行文件处理/计算(分别为每个文件) - 读取应用程序输出。 To speed up the script I used multiprocessing. 为了加快脚本,我使用了多处理。 However, it has major drawback: each process has to maintain in RAM the whole copy of the large input file, and therefore I can run only few processes, as I run out of memory. 但是,它有一个主要缺点:每个进程都必须在RAM中维护大输入文件的整个副本,因此我只能运行几个进程,因为我的内存不足。 Thus I decided to try multithreading instead (or some combination of multiprocessing and multithreading) due to the fact that threads share the address space. 因此,我决定尝试多线程(或多处理和多线程的某种组合),因为线程共享地址空间。 As the python part most of the time works with file I/O or waits for the C++ application to complete, I thought that GIL must not be an issue here. 由于python部分大部分时间都在使用文件I/O或等待C++应用程序完成,我认为GIL一定不是问题。 Nevertheless, instead of some gain in performance I observe drastic slowdown, mainly owing to the I/O part. 尽管如此,主要由于I/O部分,我观察到性能急剧下降,而不是性能提升。

I illustrate the problem with the following code (saved as test.py ): 我用以下代码说明了问题(保存为test.py ):

import sys, threading, tempfile, time

nthreads = int(sys.argv[1])

class IOThread (threading.Thread):
    def __init__(self, thread_id, obj):
        threading.Thread.__init__(self)
        self.thread_id = thread_id
        self.obj = obj
    def run(self):
        run_io(self.thread_id, self.obj)

def gen_object(nlines):
    obj = []
    for i in range(nlines):
        obj.append(str(i) + '\n')
    return obj

def run_io(thread_id, obj):
    ntasks = 100 // nthreads + (1 if thread_id < 100 % nthreads else 0)
    for i in range(ntasks):
        tmpfile = tempfile.NamedTemporaryFile('w+')
        with open(tmpfile.name, 'w') as ofile:
            for elem in obj:
                ofile.write(elem)
        with open(tmpfile.name, 'r') as ifile:
            content = ifile.readlines()
        tmpfile.close()

obj = gen_object(100000)
starttime = time.time()
threads = []
for thread_id in range(nthreads):
    threads.append(IOThread(thread_id, obj))
    threads[thread_id].start()
for thread in threads:
    thread.join()
runtime = time.time() - starttime
print('Runtime: {:.2f} s'.format(runtime))

When I run it with different number of threads, I get this: 当我用不同数量的线程运行它时,我得到这个:

$ python3 test.py 1
Runtime: 2.84 s
$ python3 test.py 1
Runtime: 2.77 s
$ python3 test.py 1
Runtime: 3.34 s
$ python3 test.py 2
Runtime: 6.54 s
$ python3 test.py 2
Runtime: 6.76 s
$ python3 test.py 2
Runtime: 6.33 s

Can someone explain me the result, as well as give some advice, how to effectively parallelize I/O using multithreading? 有人可以向我解释结果,并提供一些建议,如何使用多线程有效地并行化I/O

EDIT: 编辑:

The slowdown is not due to HDD performance, because: 减速不是由于硬盘性能,因为:

1) the files are getting cached to RAM anyway 1)无论如何文件都被缓存到RAM

2) the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number) 2)多处理(不是多线程)的相同操作确实变得更快(几乎是CPU数量的因素)

As I delved deeper into the problem, I made comparison benchmarks for 4 different parallelisation methods, 3 of which are using python and 1 is using java (the purpose of the test was not to compare I/O machinery between different languages but to see if multithreading can boost I/O operations). 当我深入研究这个问题时,我为4种不同的并行化方法做了比较基准测试,其中3种使用python ,1种使用java (测试的目的不是比较不同语言之间的I / O机制,而是看看是否多线程可以提升I / O操作)。 The test was performed on Ubuntu 14.04.3, all files were placed to a RAM disk. 测试在Ubuntu 14.04.3上进行,所有文件都放在RAM磁盘上。

Although the data are quite noisy, the clear trend is evident (see the chart; n=5 for each bar, error bars represent SD): python multithreading fails to boost the I/O performance. 虽然数据非常嘈杂,但显而易见的趋势是明显的(参见图表;每个条形n = 5,误差条表示SD):python多线程无法提升I / O性能。 The most probable reason is GIL, and therefore there is no way around it. 最可能的原因是GIL,因此无法绕过它。

在此输入图像描述

I think your performance measures don't lie: you're asking your hard disk to do many things at the same time. 我认为你的性能指标不是谎言:你要求你的硬盘同时做很多事情。 Reads, writes, fsync when closing the files, ... and on several files at the same time. 关闭文件时读取,写入,fsync,以及同时在多个文件上读取。 It triggers a lot of hardware physical operations. 它会触发大量硬件物理操作。 And the more files you write at the same time, the more contention you get. 你同时写的文件越多,你得到的争论就越多。

So the CPU is waiting for the disk operation to finish... 所以CPU正在等待磁盘操作完成......

Moreover, maybe you don't have a SSD hard disk, so the syncs actually mean some physical moves. 此外,也许您没有SSD硬盘,因此同步实际上意味着一些物理移动。

EDIT: it could be a GIL problem. 编辑:这可能是一个GIL问题。 When you iterate elem in obj in run_io, you execute python code between each write. 当你在run_io中的obj中迭代elem时,你会在每次写入之间执行python代码。 The ofile.write probably release the GIL, so that the IO doesnt block the other threads, but the lock is released/acquired with each iteration. ofile.write可能会释放GIL,因此IO不会阻塞其他线程,但每次迭代都会释放/获取锁。 So maybe your writes don't really run "concurrently". 所以也许你的写作并不真正“同时”运行。

EDIT2: to test the hypothesis you can try to replace: EDIT2:测试您可以尝试替换的假设:

for elem in obj:
    ofile.write(elem)

with: 有:

ofile.write("".join(obj))

and see if perf gets better 并看看perf是否变得更好

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM