简体   繁体   English

如何使用 Raku 语法加速并行解析?

[英]How to speed up parallel parsing using Raku grammars?

Parsing a few hundreds files using my grammar using a plain使用我的语法解析数百个文件

for @files -> $file {
    my $input = $file.IO.slurp;
    my $output = parse-and-convert($input);
    $out-dir.IO.add($file ~ '.out').spurt: $output;
}

loop is relatively slow and takes ~20 seconds on my machine, so I've decided to speed this up by doing this instead:循环相对较慢,在我的机器上大约需要 20 秒,所以我决定通过这样做来加快速度:

my @promises;
for @files -> $file {
    my $input = $file.IO.slurp;
    @promises.append: start parse-and-convert($input);
}

for await @promises -> $output {
    $out-dir.IO.add($file ~ '.out').spurt: $output;
}

This works (at least in my real code, ie modulo any typos in this illustrative example), but the speedup is much less than I hoped for: it now takes ~11s, ie I've gained a factor of only two.这可行(至少在我的真实代码中,即在这个说明性示例中以任何拼写错误为模),但加速比我希望的要小得多:它现在需要大约 11 秒,即我只获得了两倍。 This is appreciable, of course, but it looks like there is a lot of contention because the program uses less than 6 CPUs (on a system with 16 of them) and quite a bit of overhead (because I don't get a factor of 6 speedup neither).当然,这是可以理解的,但看起来有很多争用,因为程序使用的 CPU 少于 6 个(在有 16 个 CPU 的系统上)和相当多的开销(因为我没有得到一个因素6 都没有加速)。

I've confirmed (by inserting some say now - INIT.now ) that almost all the running time is really spent inside await , as expected, but I have no idea how could I debug/profile it further.我已经确认(通过插入一些say now - INIT.now )几乎所有的运行时间实际上都花在了await中,正如预期的那样,但我不知道如何进一步调试/分析它。 I'm doing this under Linux, so I could use perf, but I am not sure how would it help me at Raku level.我在 Linux 下执行此操作,因此我可以使用 perf,但我不确定它在 Raku 级别对我有何帮助。

Would there be some simple way to improve the degree of parallelism here?是否有一些简单的方法可以提高这里的并行度?

Edit : Just to make it clear, I can live with 20s (well, 30s by now, as I've added more things) running time, I'm really mostly curious about whether the degree of parallelism could be somehow improved here, without rewriting the grammar (unless there is something very specific, like eg use of dynamic variables , that should be avoided when using multiple threads).编辑:为了清楚起见,我可以忍受 20 秒(嗯,现在是 30 秒,因为我已经添加了更多东西)运行时间,我真的很想知道并行度是否可以在这里以某种方式提高,没有重写语法(除非有一些非常具体的东西,例如使用动态变量,在使用多线程时应该避免)。

A question, and a suggestion:一个问题,一个建议:

Does your Grammar parse entire documents, or only portions of those documents (sections, paragraphs, lines, etc.)?您的语法是解析整个文档,还是仅解析这些文档的一部分(部分、段落、行等)?

If your Grammar only parses at the paragraph or lines level, then you might be spending a lot of time slurp ing your files in. The advantage of Raku's lines routine is that it reads lazily.如果您的语法仅在段落或行级别上进行解析,那么您可能会花费大量时间来消化您的文件slurplines例程的优点是它读取延迟。 To replicate and replace slurp for your second line of code you could try something like:要为您的第二行代码复制和替换slurp ,您可以尝试以下操作:

my $input = $file.IO.lines.join("\n");

Otherwise, if your Grammar parses on the paragraph level, you use the power of arrays in Raku (note below the assignment to @input instead of $input ).否则,如果您的语法在段落级别解析,您将使用 Raku 中 arrays 的强大功能(注意下面的分配给@input而不是$input )。 You can also >> (hyper) process array elements to provide speedup, because as the docs say, "...all hyper operators are candidates for parallelism...":您还可以>> (hyper)处理数组元素以提供加速,因为正如文档所说, “......所有超运算符都是并行性的候选者......”:

my @input = $file.IO.split("\n\n");

If you have complex Paragraph (pre)-parsing to do, check out the Text::Paragraph submodule of @Codesections' _ "lowbar" module:如果您有复杂的段落(预)解析要做,请查看 @Codesections' _ "lowbar" 模块的Text::Paragraph子模块:

https://github.com/codesections/_/blob/main/lib/Text/Paragraphs/README.md https://github.com/codesections/_/blob/main/lib/Text/Paragraphs/README.md

In any case, it seems your best opportunity for speedup is reducing 'impedence-mismatch', ie making sure that the chunk-size you're feeding to your Grammar matches the size that the Grammar expects (rather than creating a file-read bottleneck prior to the Grammar being executed).无论如何,加速的最佳机会似乎是减少“阻抗不匹配”,即确保您提供给语法的块大小与语法期望的大小相匹配(而不是创建文件读取瓶颈在执行语法之前)。

HTH. HTH。

If you don't care about the order in which things occur, you can use race on any Iterable (in this case, your @files ).如果您不关心事情发生的顺序,您可以在任何Iterable上使用race (在这种情况下,您的@files )。 This will by default create work for CPU-cores - 1 threads, and create work loads of 64 items to be processed per thread at a time.默认情况下,这将为CPU-cores - 1线程,并创建每个线程一次处理64项目的工作负载。

Since grammar parsing is a notoriously expensive process, it's probably wise to let each thread handle 1 file at a time.由于语法解析是一个众所周知的昂贵过程,因此让每个线程一次处理 1 个文件可能是明智的。 You can specify that with the batch argument.您可以使用batch参数指定它。

Relatedly, Intel processors typically claim to have 2x more CPUs than is available for typical workloads.与此相关的是,英特尔处理器通常声称拥有比用于典型工作负载的 CPU 多 2 倍的 CPU。 So you might want to play with the degree argument (which indicates the maximum number of threads to be used) as well, because parsing a grammar creates similar types of workload.因此,您可能还想使用degree参数(指示要使用的最大线程数),因为解析语法会创建类似类型的工作负载。

So your code:所以你的代码:

for @files.race(batch => 1, degree => 8) -> $file {
    my $input = $file.IO.slurp;
    my $output = parse-and-convert($input);
    $out-dir.IO.add($file ~ '.out').spurt: $output;
}

Note the only thing you needed to add to your original code was: .race(batch => 1, degree => 8)请注意,您唯一需要添加到原始代码中的是: .race(batch => 1, degree => 8)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM