简体   繁体   English

Perl 的 Capture::Tiny::capture() 是否避免使用 system() 时需要磁盘 io?

[英]does Perl's Capture::Tiny::capture() avoid disk io required when using system()?

When calling an external program from a Perl script, does Capture::Tiny avoid disk io required when using system()?从 Perl 脚本调用外部程序时,Capture::Tiny 是否避免使用 system() 时需要磁盘 io? I get essentially the same performance when using either.使用两者时,我获得的性能基本相同。 A colleague is using my code and told me that it was hammering his disks.一位同事正在使用我的代码并告诉我它正在锤击他的磁盘。 I (perhaps) don't have this problem when running on my local machine and writing to local disks.在我的本地机器上运行并写入本地磁盘时,我(也许)没有这个问题。

I was previously doing this:我以前是这样做的:

open($fhStdin, ">stdin.txt");
print $fhStdin "some text\n";
close($fhStdin);
system("cmd < stdin.txt 1> stdout.txt 2> stderr.txt"); 
# open and read stdout.txt
# open and read stderr.txt

And changed to this:并改为:

($stdout, $stderr, $exit) = capture {
    open($fhStdin, '| cmd');
    print $fhStdin "some text\n";
    close($fhStdin);
};

But NYTProf tells me that they take essentially the same amount of time to run (but NYTProf removes disk io overheads from subroutine times).但 NYTProf 告诉我,它们运行所需的时间基本上相同(但 NYTProf 从子例程时间中删除了磁盘 io 开销)。 So I wondered if capture() is writing to temporary files under the hood?所以我想知道 capture() 是否正在写入引擎盖下的临时文件? (I tried reading the Tiny.pm source code but am ashamed to say I couldn't tell from that.) (我尝试阅读 Tiny.pm 源代码,但很惭愧地说我无法从中分辨出来。)

Thanks for any tips.感谢您提供任何提示。

The documentation for Capture::Tiny::capture states that files are indeed used Capture::Tiny::capture的文档指出确实使用了文件

Captures are normally done to an anonymous temporary filehandle.捕获通常是对匿名临时文件句柄进行的。

This can be seen in the source for the _capture_tee sub, used as a generic routine for all methods.这可以在_capture_tee子的源代码中看到,用作所有方法的通用例程。 About half-way through this sub we find a call to File::Temp->new happening, unless named files are to be used (see below).大约进行到一半时,我们发现调用File::Temp->new发生,除非要使用命名文件(见下文)。 The rest of processing can be traced with some care.可以小心地跟踪其余的处理过程。

The docs proceed to offer a way to monitor all this via named files instead文档继续提供一种通过命名文件来监视所有这些的方法

To capture via a named file (eg to externally monitor a long-running capture), provide custom filehandles as a trailing list of option pairs:要通过命名文件进行捕获(例如从外部监视长时间运行的捕获),请提供自定义文件句柄作为选项对的尾随列表:

 my $out_fh = IO::File->new("out.txt", "w+"); my $err_fh = IO::File->new("out.txt", "w+"); capture { ... } stdout => $out_fh, stderr => $err_fh;

The filehandles must be read/write and seekable.文件句柄必须是读/写和可查找的。 Modifying the files or filehandles during a capture operation will give unpredictable results.在捕获操作期间修改文件或文件句柄将产生不可预测的结果。 Existing IO layers on them may be changed by the capture.捕获可能会更改它们上现有的 IO 层。

(If this is done then the call to File::Temp doesn't go, as mentioned above. See source.) (如果这样做,那么对File::Temp的调用将不会进行,如上所述。请参阅源代码。)

If this disk activity is a problem you can use piped open to read cmd 's output (write its input to a file first), or use qx (backticks).如果此磁盘活动有问题,您可以使用管道打开读取cmd的输出(首先将其输入写入文件),或使用qx (反引号)。 But then you'd have to merge or redirect STDERR and go through more hoops to check and handle error.但是你必须合并或重定向STDERR并通过更多的箍来检查和处理错误。

Another option is to use IPC::Run3 .另一种选择是使用IPC::Run3 While it also uses files it offers far more options which may be leveraged to lessen the disk I/O, or perhaps avoid disk altogether.虽然它也使用文件,但它提供了更多的选项,可以用来减少磁盘 I/O,或者可能完全避免使用磁盘。 (The idea to invoke with a filehandle opened to a scalar (in-memory) doesn't work since this isn't a real filehandle. ) (使用打开为标量(内存中)的文件句柄进行调用的想法不起作用,因为这不是真正的文件句柄。

The "nuclear" option is the more complex IPC::Run which can take output without using disk. “核”选项是更复杂的IPC::Run ,它可以在不使用磁盘的情况下进行输出。


A crude sketch 粗略的草图

The "dispatch" of all methods to _capture_tee is done in the beginning , where a set of flags is unshift ed to @_ before goto &func takes it away, to distinguish methods.将所有方法“调度”到_capture_tee在开始时完成,在goto &func将其带走之前,一组标志被unshift ed 到@_ ,以区分方法。 For capture this is 1,1,0,0 , what sets up variables $do_stdout and $do_stderr in _capture_tee .对于capture这是1,1,0,0 ,它在1,1,0,0设置变量$do_stdout$do_stderr _capture_tee These are then used to set up the %do hash , which keys are iterated over to set up $stash .然后使用这些来设置%do哈希,这些键被迭代以设置$stash

If extra arguments were passed to capture (for named files) then $stash->{capture} is set , otherwise a File::Temp object is assigned.如果将额外的参数传递给capture (对于命名文件),则设置$stash->{capture} ,否则分配File::Temp对象。 The $stash is later passed to _open_std where the redirection happens. $stash稍后会传递给发生重定向的_open_std

There is a lot more, but mostly related to manipulation of localized globs and layers.还有更多,但主要与本地化球体和图层的操作有关。


The most usual invocation writes to scalar(s) 最常见的调用写入标量

run3 \@cmd, \my $in, \my $out, \my $err;

but this uses files, as explained in docs under How it works .但这使用文件,如How it works下的文档中所述。

An attempt to trick it into not using files, by writing to a filehandle which is opened to a scalar试图通过写入打开为标量的文件句柄来欺骗它不使用文件

my @cmd = qw(ls -l .);
open my $fh, '>', \my $cmd_out;  # not a real filehandle ...
run3 \@cmd, \undef, $fh;         # ... so this won't work

aborts with中止

run3(): Invalid argument redirecting STDOUT at ...

This is because an open to a scalar doesn't set up a real filehandle.这是因为对标量的open不会设置真正的文件句柄。 See this post .看到这个帖子

If the filehandle is opened to a file this works as intended, writing to that file.如果文件句柄被打开到一个文件,这将按预期工作,写入该文件。 This may well result in a more efficient disk I/O operation, compared with what Capture::Tiny does.Capture::Tiny相比,这可能会导致更高效的磁盘 I/O 操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM