[英]How to emulate wc -l in Raku
In perl 5, you can emulate wc -l
using oneliner:在 perl 5 中,您可以使用 oneliner 模拟
wc -l
:
perl -lnE 'END {say $.}' test.txt
How to implement this functionality on Raku如何在 Raku 上实现此功能
If you try to implement this:如果您尝试实现这一点:
raku -e 'say "test.txt".IO.open.lines.elems'
it turns out to be slow and uses a lot of memory事实证明它很慢并且使用了大量内存
Information for reproduce:重现的信息:
$ wget http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20Sales%20Records.zip
$ unzip "1500000 Sales Records.zip"
$ mv "1500000 Sales Records.csv" part.txt
$ for i in `seq 1 10`; do cat part.txt >> test.txt ; done
$ du -sh test.txt
1.8G test.txt
$ time wc -l test.txt
15000000 test.txt
real 0m0,350s
user 0m0,143s
sys 0m0,205s
$ time perl -lnE 'END { say $. }' test.txt
15000001
real 0m1,981s
user 0m1,719s
sys 0m0,256s
$ time raku -e 'say "test.txt".IO.open.lines.elems'
15000001
real 2m51,852s
user 0m25,129s
sys 0m6,378s
# Using swap (maximum uses 2.2G swap):
# Before `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 1695 12604 107 708 12917
Swap: 7583 0 7583
# After `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 752 13923 72 332 13899
Swap: 7583 779 6804
# Swap not used
$ time raku -ne '++$ andthen END .say' test.txt
15000001
real 1m44,906s
user 2m14,165s
sys 0m0,653s
$ raku -v
This is Rakudo version 2019.11 built on MoarVM version 2019.11
implementing Perl 6.d.
One option that's still likely to be pretty slow compared to perl
but worth comparing:与
perl
相比,一个仍然可能很慢但值得比较的选项:
raku -ne '++$ andthen END .say' test.txt
The l
command line option is redundant. l
命令行选项是多余的。
$
is an anonymous state scalar. $
是匿名状态标量。
andthen
tests that its lhs is defined, and if so, sets that value as the topic ( $_
) and then evaluates its rhs. andthen
测试它的 lhs 是否已定义,如果是,则将该值设置为主题 ( $_
),然后评估其 rhs。
END
is similar to perl
's END
. END
类似于perl
的END
。 Note that it returns Nil
to the andthen
but that doesn't matter here because we're using the END
's statement for its side-effect.请注意,它向
andthen
返回Nil
,但这在这里无关紧要,因为我们正在使用END
的语句来实现它的副作用。
Several things will impact this code's speed.有几件事会影响此代码的速度。 Some things I can think of:
我能想到的一些事情:
Compiler startup overhead.编译器启动开销。 Ignoring any modules being used, the
raku
compiler Rakudo has a startup overhead of about a tenth of a second on typical hardware compared to a fairly negligible one for perl
.忽略正在使用的任何模块,与
perl
相比, raku
编译器 Rakudo 在典型硬件上的启动开销约为十分之一秒。
The notion of a "line". “线”的概念。 In
perl
, the default notion of line processing is reading a series of bytes, some of which represent a line end.在
perl
,行处理的默认概念是读取一系列字节,其中一些表示行结束。 In raku
, the default notion of line processing is reading a UTF-8 string, some of which represents line ends.在
raku
,行处理的默认概念是读取 UTF-8 字符串,其中一些表示行结束。 Thus perl
only incurs the reading overhead of an ASCII (or Extended ASCII) decoder whereas raku
incurs the reading overhead of a UTF-8 decoder.因此,
perl
只会导致 ASCII(或扩展 ASCII)解码器的读取开销,而raku
会导致 UTF-8 解码器的读取开销。
Compiler optimizations.编译器优化。
perl
is generally optimized to the max. perl
通常被优化到最大值。 It wouldn't surprise me if perl -lnE 'END {say $.}' test.txt
takes advantage of some clever optimizations.如果
perl -lnE 'END {say $.}' test.txt
利用了一些巧妙的优化,我不会感到惊讶。 In contrast, work on Rakudo optimization is still in its early days relatively speaking.相比之下,Rakudo 优化的工作相对来说还处于起步阶段。
The only things I think anyone can do about the first and last of the three points I've mentioned above are to wait N years and/or contribute to the compiler's improvement.对于我上面提到的三点中的第一点和最后一点,我认为任何人都可以做的唯一事情就是等待 N 年和/或为编译器的改进做出贡献。
There will be a way to work around raku's UTF-8-by-default.将有一种方法可以解决 raku 默认的 UTF-8。 Perhaps something like the following is already doable and significantly faster than raku's default, at least ignoring the overhead of using a module called
foo
:也许像下面这样的东西已经可行并且比 raku 的默认值快得多,至少忽略使用名为
foo
的模块的开销:
raku -Mfoo -ne '++$ andthen END .say' test.txt
where module foo
switches the default encoding for file I/O to ASCII or whatever from the available encodings .其中模块
foo
将文件 I/O 的默认编码切换为 ASCII 或可用编码中的任何内容。
I haven't checked that this is actually doable in current Rakudo but would be surprised if were not.我还没有检查过这在当前的 Rakudo 中实际上是可行的,但如果不是,我会感到惊讶。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.