简体   繁体   English

如何在 Raku 中模拟 wc -l

[英]How to emulate wc -l in Raku

In perl 5, you can emulate wc -l using oneliner:在 perl 5 中,您可以使用 oneliner 模拟wc -l

perl -lnE 'END {say $.}' test.txt

How to implement this functionality on Raku如何在 Raku 上实现此功能

If you try to implement this:如果您尝试实现这一点:

raku -e 'say "test.txt".IO.open.lines.elems'

it turns out to be slow and uses a lot of memory事实证明它很慢并且使用了大量内存

Information for reproduce:重现的信息:

$ wget http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20Sales%20Records.zip
$ unzip "1500000 Sales Records.zip"
$ mv "1500000 Sales Records.csv" part.txt
$ for i in `seq 1 10`; do cat part.txt >> test.txt ; done
$ du -sh test.txt
1.8G    test.txt

$ time wc -l test.txt
15000000 test.txt

real    0m0,350s
user    0m0,143s
sys     0m0,205s

$ time perl -lnE 'END { say $. }' test.txt
15000001

real    0m1,981s
user    0m1,719s
sys     0m0,256s

$ time raku -e 'say "test.txt".IO.open.lines.elems'
15000001

real    2m51,852s
user    0m25,129s
sys     0m6,378s

# Using swap (maximum uses 2.2G swap):
# Before `raku -e ''`

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15009        1695       12604         107         708       12917
Swap:          7583           0        7583

# After `raku -e ''`

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15009         752       13923          72         332       13899
Swap:          7583         779        6804

# Swap not used
$ time raku -ne '++$ andthen END .say' test.txt
15000001

real    1m44,906s
user    2m14,165s
sys     0m0,653s

$ raku -v
This is Rakudo version 2019.11 built on MoarVM version 2019.11
implementing Perl 6.d.

One option that's still likely to be pretty slow compared to perl but worth comparing:perl相比,一个仍然可能很慢但值得比较的选项:

raku -ne '++$ andthen END .say' test.txt

The l command line option is redundant. l命令行选项是多余的。

$ is an anonymous state scalar. $是匿名状态标量。

andthen tests that its lhs is defined, and if so, sets that value as the topic ( $_ ) and then evaluates its rhs. andthen测试它的 lhs 是否已定义,如果是,则将该值设置为主题 ( $_ ),然后评估其 rhs。

END is similar to perl 's END . END类似于perlEND Note that it returns Nil to the andthen but that doesn't matter here because we're using the END 's statement for its side-effect.请注意,它向andthen返回Nil ,但这在这里无关紧要,因为我们正在使用END的语句来实现它的副作用。

Several things will impact this code's speed.有几件事会影响此代码的速度。 Some things I can think of:我能想到的一些事情:

  • Compiler startup overhead.编译器启动开销。 Ignoring any modules being used, the raku compiler Rakudo has a startup overhead of about a tenth of a second on typical hardware compared to a fairly negligible one for perl .忽略正在使用的任何模块,与perl相比, raku编译器 Rakudo 在典型硬件上的启动开销约为十分之一秒。

  • The notion of a "line". “线”的概念。 In perl , the default notion of line processing is reading a series of bytes, some of which represent a line end.perl ,行处理的默认概念是读取一系列字节,其中一些表示行结束。 In raku , the default notion of line processing is reading a UTF-8 string, some of which represents line ends.raku ,行处理的默认概念是读取 UTF-8 字符串,其中一些表示行结束。 Thus perl only incurs the reading overhead of an ASCII (or Extended ASCII) decoder whereas raku incurs the reading overhead of a UTF-8 decoder.因此, perl只会导致 ASCII(或扩展 ASCII)解码器的读取开销,而raku会导致 UTF-8 解码器的读取开销。

  • Compiler optimizations.编译器优化。 perl is generally optimized to the max. perl通常被优化到最大值。 It wouldn't surprise me if perl -lnE 'END {say $.}' test.txt takes advantage of some clever optimizations.如果perl -lnE 'END {say $.}' test.txt利用了一些巧妙的优化,我不会感到惊讶。 In contrast, work on Rakudo optimization is still in its early days relatively speaking.相比之下,Rakudo 优化的工作相对来说还处于起步阶段。

The only things I think anyone can do about the first and last of the three points I've mentioned above are to wait N years and/or contribute to the compiler's improvement.对于我上面提到的三点中的第一点和最后一点,我认为任何人都可以做的唯一事情就是等待 N 年和/或为编译器的改进做出贡献。

There will be a way to work around raku's UTF-8-by-default.将有一种方法可以解决 raku 默认的 UTF-8。 Perhaps something like the following is already doable and significantly faster than raku's default, at least ignoring the overhead of using a module called foo :也许像下面这样的东西已经可行并且比 raku 的默认值快得多,至少忽略使用名为foo的模块的开销:

raku -Mfoo -ne '++$ andthen END .say' test.txt

where module foo switches the default encoding for file I/O to ASCII or whatever from the available encodings .其中模块foo将文件 I/O 的默认编码切换为 ASCII 或可用编码中的任何内容。

I haven't checked that this is actually doable in current Rakudo but would be surprised if were not.我还没有检查过这在当前的 Rakudo 中实际上是可行的,但如果不是,我会感到惊讶。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM