awk vs nawk vs mawk 处理重文件

Question

I'm dealing with a few really large files which make macbook pro throttle.我正在处理一些非常大的文件，这些文件使 macbook pro 节流。 I was thinking about using faster implementations of awk.我正在考虑使用更快的 awk 实现。 I have heard awk is much faster.我听说 awk 快得多。 Can I just install mawk, change awk syntax to mawk and use it?我可以只安装 mawk，将 awk 语法更改为 mawk 并使用它吗？ Will this simply speed up processing?这会简单地加快处理速度吗？

Answer 1

First, if you can, set LC_ALL=C and see if this provides enough boost:首先，如果可以，设置 LC_ALL=C 并查看这是否提供了足够的提升：

$ LC_ALL=C awk 'foo'

mawk is quite fast, but I have found that it does not necessarily run awk scripts as expected -- I always need to double-check that it is doing the right thing. mawk非常快，但我发现它不一定按预期运行awk脚本——我总是需要仔细检查它是否在做正确的事情。

gawk seems to me to have increased it's speed in the past few years -- ymmv.在我看来， gawk在过去几年中提高了它的速度——ymmv。

Answer 2

mawk 1.9.9.6 (mawk-2 beta) is by far the fastest one. mawk 1.9.9.6 (mawk-2 beta) 是迄今为止最快的。

I got to URI-quote-plus encoding much faster than even built-in module in python3.我使用 URI-quote-plus 编码的速度甚至比 python3 中的内置模块要快得多。 Nowadays, took my 2018 Mac about 13.9 seconds to traverse a 12.3 million row text file that's 1.82GB in size, and count out exactly every byte,如今，我的 2018 Mac 花了大约 13.9 秒来遍历大小为 1.82GB 的 1230 万行文本文件，并准确计算出每个字节，

PLUS, every UTF-8 code point, all 1.2x billion of them,另外，每个 UTF-8 代码点，全部 12 亿个，

despite itself not being Unicode-aware.尽管它本身不是 Unicode 感知的。

even gnu-awk in Unicode-aware mode or macOS built-in wc -lm doesn't go as fast.甚至 Unicode 感知模式下的 gnu-awk 或 macOS 内置 wc -lm 也没有那么快。

awk vs nawk vs mawk 处理重文件

问题描述

2 个解决方案

解决方案1
0 已采纳 2015-11-22 00:43:16

解决方案2
0 2021-02-02 23:51:43

awk vs nawk vs mawk 处理重文件

问题描述

2 个解决方案

解决方案1 0 已采纳 2015-11-22 00:43:16

解决方案2 0 2021-02-02 23:51:43

解决方案1
0 已采纳 2015-11-22 00:43:16

解决方案2
0 2021-02-02 23:51:43