简体   繁体   English

如何使用ThreadPoolExecutor递归遍历目录?

[英]How to recursive traversal directory using ThreadPoolExecutor?

My real task is to recursive traversal a remote directory using paramiko with multi-threading. 我的真正任务是使用多线程的paramiko递归遍历远程目录。 For the sake of simplicity, I just use local filesystem to demonstrate it: 为简单起见,我只使用本地文件系统来演示它:

from pathlib import Path
from typing import List
from concurrent.futures import ThreadPoolExecutor, Executor

def listdir(root: Path, executor: Executor) -> List[Path]:
    if root.is_dir():
        xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
        return sum(xss, [])
    return [root]

with ThreadPoolExecutor(4) as e:
    listdir(Path('.'), e)

However, the above code running without end. 但是,上面的代码运行没有尽头。

What's wrong with my code? 我的代码出了什么问题? And how to fix it (better to use Executor rather than the raw Thread )? 以及如何解决它(更好地使用Executor而不是原始Thread )?

EDIT: I have confirmed @Sraw 's answer by the following code: 编辑:我通过以下代码确认了@Sraw的答案:

In [4]: def listdir(root: Path, executor: Executor) -> List[Path]:
   ...:     print(f'Enter {root}', flush=True)
   ...:     if root.is_dir():
   ...:         xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
   ...:         return sum(xss, [])
   ...:     return [root]
   ...:

In [5]: with ThreadPoolExecutor(4) as e:
   ...:     listdir(Path('.'), e)
   ...:
Enter .
Enter NonRestrictedShares
Enter corporateActionData
Enter RiskModelAnnualEPS
Enter juyuan

There is a dead lock inside your code. 代码中有一个死锁。

As you are using ThreadPoolExecutor(4) , there are only four work threads in this executor, so you cannot run more than four tasks at the same time. 当您使用ThreadPoolExecutor(4) ,此执行程序中只有四个工作线程,因此您不能同时运行四个以上的任务。

Image the following simplest structure: 想象以下最简单的结构:

test
----script.py
----test1
--------test2
------------test3
----------------test4
--------------------test5

If python script.py , the first work thread handles test1 , the second one handles test1/test2 , the third one handles test1/test2/test3 , the fourth one handles test1/test2/test3/test4 . 如果python script.py ,第一个工作线程处理test1 ,第二个处理test1/test2 ,第三个处理test1/test2/test3 ,第四个处理test1/test2/test3/test4 And now the work threads are exhausted. 现在工作线程已经筋疲力尽了。 But there is another task test1/test2/test3/test4/test5 inserted into work queue. 但是还有另一个任务test1/test2/test3/test4/test5插入到工作队列中。

So it will hang forever. 所以它将永远挂起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM