简体   繁体   English

gawk 和 mawk 的区别(列宽)

[英]The differences between gawk and mawk (column width)

I have a file:我有一个文件:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8, 
żeby 
było śmieszniej, haha.
ą
a

Example gawk:示例傻瓜:

gawk '{printf "%-80s %-s\n", $0, length}' file

In gawk, I get the correct result:在 gawk 中,我得到了正确的结果:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,         73
żeby                                                                             5
było śmieszniej, haha.                                                           22
ą                                                                                1
a                                                                                1

In gawk, I get the correct result:在 gawk 中,我得到了正确的结果:


Example mawk:示例:

mawk '{printf "%-80s %-s\n", $0, length}' file
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,  80
żeby                                                                            6
było śmieszniej, haha.                                                         24
ą                                                                               2
a                                                                                1

In mawk, I get the incorrect result:在 mawk 中,我得到了不正确的结果:

As mawk get the same result as gawk?作为 mawk 得到与 gawk 相同的结果?

mawk is a minimal-featured awk designed for speed of execution over functionality. mawk 是一个功能最少的 awk,旨在提高执行速度而不是功能。 You should not expect it to behave exactly the same as gawk or a POSIX awk.您不应期望它的行为与 gawk 或 POSIX awk 完全相同。 If you're going to use mawk, you need to get a mawk manual describing how IT behaves, don't rely on any other documentation describing how other awks behave.如果您要使用 mawk,则需要获取描述 IT 行为方式的mawk 手册,不要依赖任何其他描述其他 awk 行为方式的文档。

IMHO there is no correct result for the formatting string %-s as it is meaningless to align a string without specifying a width within which to align it.恕我直言,格式化字符串%-s没有正确的结果,因为在不指定对齐宽度的情况下对齐字符串是没有意义的。 There's also different interpretations of what length means on it's own - it could be short-hand for length($0) or it could be something else in a non-POSIX awk, there might not even be a length function in some non-POSIX awk and so it might take that as an undefined variable name.对于length本身的含义也有不同的解释 - 它可能是length($0)的简写,也可能是非 POSIX awk 中的其他东西,在某些非 POSIX awk 中甚至可能没有长度函数因此它可能会将其视为未定义的变量名。 How does any given awk handle non-English characters?任何给定的 awk 如何处理非英文字符?

As I said - if you're going to use a non-POSIX awk, you need to check the manual for THAT awk for all of the gory details...正如我所说 - 如果您要使用非 POSIX awk,您需要查看该 awk 的手册以了解所有血腥细节......

I assume you are using different systems... because awk installation on a system uses to be a symlink to either gawk or mawk.我假设您使用的是不同的系统...因为系统上的 awk 安装曾经是 gawk 或 mawk 的符号链接。

All awk versions are compatible as long as the versions coincide.只要版本一致,所有 awk 版本都是兼容的。

I therefore assume that the issue you are facing is due to the use of an older and a newer version of the programs.因此,我假设您面临的问题是由于使用了旧版本和新版本的程序。

UPDATE 1 : realized i could massively streamline it -更新1:意识到我可以大规模简化它-

  • the only thing one needs is to pad back the count of UTF-8 continuation bytes into the total width, and by defining FS as such, then the count will always be NF - 1 for non-empty lines, and the count at the tail end of the line reflects the UTF-8 character count (strictly speaking… it's a code-point count)唯一需要做的就是将UTF-8连续字节的计数填充到总宽度中,并通过这样定义FS ,那么对于非空行,计数将始终为NF - 1 ,尾部的计数行尾反映了UTF-8 character count (严格来说……这是一个代码点数)

    caveat : this code takes the leap of faith and assumes input is valid UTF-8 to begin with, w/o performing data validation checks警告:此代码大胆假设输入是有效UTF-8开头,不执行数据验证检查

= =

mawk[1/2]|gawk -b '

$!NF = sprintf("%-*s %s",(__=NF-!_)+80,$_,length($_)-__)' FS='[\\200-\\277]'

= =

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,         73
żeby                                                                             5
było śmieszniej, haha.                                                           22
ą                                                                                1
a                                                                                1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM