简体   繁体   English

具有readline的线程

[英]threads with readline

I have a huge file with hundreds of thousands of lines. 我有一个包含数十万行的巨大文件。 I need to run the same process on each line. 我需要在每一行上运行相同的过程。 My plan was to make several threads to speed up the process. 我的计划是创建多个线程以加速该过程。 Whenever I multithreaded before I used treading and Queue modules. 每当我使用多线程和队列模块之前就使用多线程。 However I cannot figure out how to apply a queue. 但是我不知道如何应用队列。 What I really need to do is read the file line by line, as the file is too large to do the whole thing. 我真正需要做的是逐行读取文件,因为文件太大,无法完成全部操作。 I thought that maybe I could just add one thing to the queue at a time with .put(), then immediately pass it to the thread, but it seems like if I did this the threads could conflict. 我以为也许我可以一次只用.put()将一件事添加到队列中,然后立即将其传递给线程,但是似乎如果我这样做,线程可能会发生冲突。 Any suggestions? 有什么建议么?

How much processing is there per line. 每行有多少处理。

If not a lot then you might slow things down with multiple threads contending for the device the file is on? 如果不是很多,那么您可能会因为有多个线程争用文件所在的设备而减慢了速度? You might want to split the file beforehand and put the components on different devices? 您可能需要事先分割文件,然后将组件放置在不同的设备上? Then it's a simple matter of firing up a process per file or per group of files. 然后,按文件或按文件组启动一个过程很简单。

I'd use the split, xargs -P unix commands for this 我将为此使用split,xargs -P unix命令

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM