简体   繁体   中英

How to define new commands or macros in awk

I like to define a new command that wraps an existing awk command, such as print . However, I do not want to use a function:

#wrap command with function
function warn(text) { print text > "/dev/stderr" }
NR%1e6 == 0 {
  warn("processed rows: "NR)
}

Instead, I like to define a new command that can be invoked without brackets:

#wrap command with new command ???
define warn rest... { print rest... > "/dev/stderr" }
NR%1e6 == 0 {
  warn "processed rows: "NR
}

One solution I can imagine is using a preprocessor and maybe setting up the shebang of the awk script nicely to invoke this preproccessor followed by awk. However, I was more hoping for a pure awk solution.

Note : The solution should also work in mawk , which I use, because it is much faster than vanilla GNU/awk.

Update : The discussion revealed that gawk (GNU/awk) can be quite fast and mawk is not required.

Looking at Mawk's source I see that commands are special and cannot be added at runtime. From kw.c :

keywords[] =
{
    { "print",    PRINT },
    { "printf",   PRINTF },
    { "do",       DO },
    { "while",    WHILE },
    { "for",      FOR },
    { "break",    BREAK },
    { "continue", CONTINUE },
    { "if",       IF },
    { "else",     ELSE },
    { "in",       IN },
    { "delete",   DELETE },
    { "split",    SPLIT },
    { "match",    MATCH_FUNC },
    { "BEGIN",    BEGIN },
    { "END",      END },
    { "exit",     EXIT },
    { "next",     NEXT },
    { "nextfile", NEXTFILE },
    { "return",   RETURN },
    { "getline",  GETLINE },
    { "sub",      SUB },
    { "gsub",     GSUB },
    { "function", FUNCTION },
    { (char *) 0, 0 }
};

You could add a new command by patching Mawk's C code.

You cannot do this within any awk and you cannot do it robustly outside of awk without writing an awk language parser and by that point you may as well write your own awk-like command which then would actually no longer really be awk in as much as it would not behave the same as any other command by that name.

It is odd that you refer to GNU awk as "vanilla" when it has many more useful features than any other currently available awk while mawk is simply a stripped down awk optimized for speed which is only necessary in very rare circumstances.

I created a shell wrapper script called cppawk which combines the C preprocessor (from GCC) with Awk.

BSD licensed, it comes with a man page, regression tests and simple install instructions.

Normally, the C preprocessor creates macros that look like functions; but using certain control flow tricks, which work in Awk also much as they do in C, we can pull off minor miracles of syntactic sugar:

function __warn(x)
{
   print x
   return 0
}

#define warn for (__w = 1; __w; __w = __warn(__x)) __x =

NR % 5 == 0 {
  warn "processed rows: "NR
}

Run:

$ cppawk -f warn.cwk 
a
b
c
d
e
processed rows: 5
f
g
h
i
j
processed rows: 10
k

Because the entire for trick is in a single line of code, we could use the __LINE__ symbol to make the hidden variables quasi-unique:

function __warn(x)
{
   print x
   return 0
}

#define xcat(a, b, c) a ## b ## c
#define cat(a, b, c) xcat(a, b, c)
#define uq(sym) cat(__, __LINE__, sym)
#define warn for (uq(w) = 1; uq(w); uq(w) = __warn(uq(x))) uq(x) =

NR % 5 == 0 {
  warn "processed rows: "NR
}

The expansion is:

$ cppawk --prepro-only -f warn.cwk 
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "<stdin>"
function __warn(x)
{
   print x
   return 0
}
NR % 5 == 0 {
  for (__13w = 1; __13w; __13w = __warn(__13x)) __13x = "processed rows: "NR
}

The u() macro interpolated 13 into the variables because warn is called on line 13.

Hope you like it.

PS, maybe don't do this, but find some less hacky way of using cppawk .

You can use C99/GNUC variadic macros, for instance:

#define warn(...) print __VA_ARGS__ >> "/dev/stderr"

NR % 5 == 0 {
  warn("processed rows:", NR)
}

We made a humble print wrapper which redirects to standard error.It seems like nothing, yet you can't do that with an Awk function: not without making it a one-argument function and passing the value of an expression which catenates everything.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM