简体   繁体   English

awk中如何定义新命令或宏

[英]How to define new commands or macros in awk

I like to define a new command that wraps an existing awk command, such as print .我喜欢定义一个新命令来包装现有的 awk 命令,例如print However, I do not want to use a function:但是,我不想使用 function:

#wrap command with function
function warn(text) { print text > "/dev/stderr" }
NR%1e6 == 0 {
  warn("processed rows: "NR)
}

Instead, I like to define a new command that can be invoked without brackets:相反,我喜欢定义一个可以在没有括号的情况下调用的新命令:

#wrap command with new command ???
define warn rest... { print rest... > "/dev/stderr" }
NR%1e6 == 0 {
  warn "processed rows: "NR
}

One solution I can imagine is using a preprocessor and maybe setting up the shebang of the awk script nicely to invoke this preproccessor followed by awk. However, I was more hoping for a pure awk solution.我可以想象的一种解决方案是使用预处理器,并可能很好地设置 awk 脚本的 shebang 以调用此预处理器,然后调用 awk。但是,我更希望有一个纯粹的 awk 解决方案。

Note : The solution should also work in mawk , which I use, because it is much faster than vanilla GNU/awk.注意:该解决方案也应该在我使用的mawk中工作,因为它比普通GNU/awk 快得多。

Update : The discussion revealed that gawk (GNU/awk) can be quite fast and mawk is not required.更新:讨论表明gawk (GNU/awk) 可以非常快并且mawk

Looking at Mawk's source I see that commands are special and cannot be added at runtime. 看看Mawk的源代码,我发现命令很特殊,无法在运行时添加。 From kw.c : kw.c

keywords[] =
{
    { "print",    PRINT },
    { "printf",   PRINTF },
    { "do",       DO },
    { "while",    WHILE },
    { "for",      FOR },
    { "break",    BREAK },
    { "continue", CONTINUE },
    { "if",       IF },
    { "else",     ELSE },
    { "in",       IN },
    { "delete",   DELETE },
    { "split",    SPLIT },
    { "match",    MATCH_FUNC },
    { "BEGIN",    BEGIN },
    { "END",      END },
    { "exit",     EXIT },
    { "next",     NEXT },
    { "nextfile", NEXTFILE },
    { "return",   RETURN },
    { "getline",  GETLINE },
    { "sub",      SUB },
    { "gsub",     GSUB },
    { "function", FUNCTION },
    { (char *) 0, 0 }
};

You could add a new command by patching Mawk's C code. 您可以通过修补Mawk的C代码来添加新命令。

You cannot do this within any awk and you cannot do it robustly outside of awk without writing an awk language parser and by that point you may as well write your own awk-like command which then would actually no longer really be awk in as much as it would not behave the same as any other command by that name. 你不能在任何awk中做到这一点,你不能在没有编写awk语言解析器的情况下在awk之外做到这一点,并且你可以编写自己类似于awk的命令然后实际上不再像awk那样它的行为与该名称的任何其他命令的行为不同。

It is odd that you refer to GNU awk as "vanilla" when it has many more useful features than any other currently available awk while mawk is simply a stripped down awk optimized for speed which is only necessary in very rare circumstances. 奇怪的是,你将GNU awk称为“vanilla”,因为它具有比任何其他当前可用的awk更多的有用功能,而mawk只是一个针对速度优化的简化awk,仅在极少数情况下才需要。

I created a shell wrapper script called cppawk which combines the C preprocessor (from GCC) with Awk.我创建了一个名为cppawk的 shell 包装器脚本,它结合了 C 预处理器(来自 GCC)和 Awk。

BSD licensed, it comes with a man page, regression tests and simple install instructions. BSD 许可,它带有手册页、回归测试和简单的安装说明。

Normally, the C preprocessor creates macros that look like functions;通常,C 预处理器创建看起来像函数的宏; but using certain control flow tricks, which work in Awk also much as they do in C, we can pull off minor miracles of syntactic sugar:但是使用某些控制流技巧,它们在 Awk 中的作用与在 C 中的作用一样多,我们可以实现语法糖的小奇迹:

function __warn(x)
{
   print x
   return 0
}

#define warn for (__w = 1; __w; __w = __warn(__x)) __x =

NR % 5 == 0 {
  warn "processed rows: "NR
}

Run:跑步:

$ cppawk -f warn.cwk 
a
b
c
d
e
processed rows: 5
f
g
h
i
j
processed rows: 10
k

Because the entire for trick is in a single line of code, we could use the __LINE__ symbol to make the hidden variables quasi-unique:因为整个for技巧都在一行代码中,我们可以使用__LINE__符号使隐藏变量成为准唯一的:

function __warn(x)
{
   print x
   return 0
}

#define xcat(a, b, c) a ## b ## c
#define cat(a, b, c) xcat(a, b, c)
#define uq(sym) cat(__, __LINE__, sym)
#define warn for (uq(w) = 1; uq(w); uq(w) = __warn(uq(x))) uq(x) =

NR % 5 == 0 {
  warn "processed rows: "NR
}

The expansion is:扩展是:

$ cppawk --prepro-only -f warn.cwk 
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "<stdin>"
function __warn(x)
{
   print x
   return 0
}
NR % 5 == 0 {
  for (__13w = 1; __13w; __13w = __warn(__13x)) __13x = "processed rows: "NR
}

The u() macro interpolated 13 into the variables because warn is called on line 13. u()宏将13插入到变量中,因为在第 13 行调用了warn

Hope you like it.希望你喜欢。

PS, maybe don't do this, but find some less hacky way of using cppawk . PS,也许不要这样做,但要找到一些使用cppawk的不那么老套的方法。

You can use C99/GNUC variadic macros, for instance:您可以使用 C99/GNUC 可变参数宏,例如:

#define warn(...) print __VA_ARGS__ >> "/dev/stderr"

NR % 5 == 0 {
  warn("processed rows:", NR)
}

We made a humble print wrapper which redirects to standard error.It seems like nothing, yet you can't do that with an Awk function: not without making it a one-argument function and passing the value of an expression which catenates everything.我们制作了一个重定向到标准错误的简陋的print包装器。它似乎什么都没有,但你不能用 Awk function 做到这一点:如果没有将它作为一个参数 function 并传递连接所有内容的表达式的值,则不能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM