简体   繁体   English

Perl-使用线程在短时间内创建带有一些数据的文本文件

[英]Perl - creating text files with some data in lesser time - using threading

Whats the best way to generate 1000K text files? 生成1000K文本文件的最佳方法是什么? (with Perl and Windows 7) I want to generate those text files in as possible in less time (possibly withing 5 minutes)? (使用Perl和Windows 7)我想在更短的时间内(可能需要5分钟)生成那些文本文件吗? Right now I am using Perl threading with 50 threads. 现在,我正在使用具有50个线程的Perl线程。 Still it is taking longer time. 仍然需要更长的时间。

What will be best solution? 什么是最佳解决方案? Do I need to increase thread count? 我是否需要增加线程数? Is there any other way to write 1000K files in under five minutes? 还有五分钟之内可以写入1000K文件的其他方法吗? Here is my code 这是我的代码

    $start = 0;
    $end = 10000;
    my $start_run = time();
    my @thr = "";

    for($t=0; $t < 50; $t++) {
        $thr[$t] = threads->create(\&files_write, $start, $end);
        #start again from 10000 to 20000 loop
        .........
    }

    for($t=0; $t < 50; $t++) {
        $thr[$t]->join();
    }

    my $end_run = time();
    my $run_time = $end_run - $start_run;
    print "Job took $run_time seconds\n";

I don't want return result of those threads. 我不想这些线程的返回结果。 I used detach() also but it didn't worked me. 我也使用了detach(),但是没有用。 For generating 500k files (with only size of 20kb) it took 1564 seconds (26min). 要生成500k文件(仅大小为20kb),需要1564秒(26分钟)。 Can I able to achieve within 5min? 我可以在5分钟内达成目标吗?

Edited: The files_write will only take values from array predefined structure and write into file. 编辑:files_write将仅从数组预定义的结构中获取值并将其写入文件。 thats it. 而已。

Any other solution? 还有其他解决方案吗?

The time needed depends on lots of factors, but heavy threading is probably not the solution: 所需的时间取决于许多因素,但是沉重的线程可能不是解决方案:

  • creating files in the same directory at the same time needs probably locking in the OS, so it's better done not too much in parallel 同时在同一目录中创建文件可能需要锁定操作系统,因此最好不要并行太多
  • the layout how the data gets written on disk depend on the amount of data and on how many writes you do in parallel. 数据如何写入磁盘的布局取决于数据量以及并行执行的写入次数。 A bad layout can impact the performance a lot, especially on HDD. 不良的布局会严重影响性能,尤其是在HDD上。 But even a SDD cannot do lots of parallel writes. 但是,即使是SDD也无法进行大量并行写入。 This all depends a lot on the disk you use, eg it is a desktop system which is optimized for sequential writes or is it a server system which can do more parallel writes as required by databases. 这全都取决于您使用的磁盘,例如,它是为顺序写入而优化的台式机系统,还是可以根据数据库要求执行更多并行写入的服务器系统。
  • ... lots of other factors, often depending on the system ...许多其他因素,通常取决于系统

I would suggest that you use a thread pool with a fixed size of threads to benchmark, what the optimal number of threads is for your specific hardware. 我建议您使用具有固定线程大小的线程池进行基准测试,以了解特定硬件的最佳线程数。 Eg start with a single thread and slowly increase the number. 例如,从单个线程开始,然后逐渐增加数量。 My guess is, that the optimal number might be between factor 0.5 and 4 of the number of processor cores you have, but like I said, it heavily depends on your real hardware. 我的猜测是,最佳数量可能在您拥有的处理器内核数量的0.5到4之间,但是就像我说的那样,它在很大程度上取决于您的实际硬件。

The slow performance is probably due to Windows having to lock the filesystem down while creating files. 性能降低可能是由于Windows在创建文件时必须锁定文件系统。

If it is only for testing - and not critical data - a RAMdisk may be ideal. 如果仅用于测试而不是关键数据,则RAMdisk可能是理想的选择。 Try Googling DataRam RAMdisk. 尝试使用Google搜索DataRam RAMdisk。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM