简体   繁体   English

awk应该在命令行分配的变量中扩展转义序列吗?

[英]Should awk expand escape sequences in command-line assigned variables?

I've recently discovered that Awk's -v VAR=VAL syntax for initializing variables on the command line expands escape sequences in VAL. 我最近发现,用于在命令行初始化变量的Awk的-v VAR=VAL语法扩展了VAL中的转义序列。 I previously thought that it was a good way to pass strings into Awk without needing to run an escaping function over them first. 我之前认为这是一种将字符串传递给Awk而不需要首先对它们运行转义函数的好方法。

For example, the following script: 例如,以下脚本:

awk -v VAR='x\tx' 'BEGIN{printf("%s\n", VAR);}'

I would expect to print 我希望打印

x\tx

but actually prints: 但实际打印:

x       x

An aside: environment variables to pass strings in unmodified instead, this question isn't asking how to get the behaviour I previously expected. 另外:环境变量以未经修改的方式传递字符串,这个问题不是询问如何获得我之前预期的行为。

Here's what the man page has to say on the matter: 以下是该手册页有关此事的内容:

-v var=val , --assign var=val Assign the value val to the variable var, before execution of the program begins. -v var = val ,-- assign var = val在程序开始执行之前,将值val赋给变量var。 Such variable values are available to the BEGIN block of an AWK program. 这些变量值可用于AWK程序的BEGIN块。

And further down: 进一步向下:

String Constants String constants in AWK are sequences of characters enclosed between double quotes (like "value"). 字符串常量 AWK中的字符串常量是用双引号括起来的字符序列(如“value”)。 Within strings, certain escape sequences are recognized, as in C. These are: 在字符串中,可识别某些转义序列,如C中所示。这些是:

... list of escape seqeuences ... ...逃脱序列列表......

The escape sequences may also be used inside constant regular expressions (eg, /[ \\t\\f\\n\\r\\v]/ matches whitespace characters). 转义序列也可以在常量正则表达式中使用(例如,/ [\\ t \\ f \\ n \\ r \\ t] /匹配空格字符)。

In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally when used in regular expression constants. 在兼容模式下,八进制和十六进制转义序列表示的字符在正则表达式常量中使用时将按字面处理。 Thus, /a\\52b/ is equivalent to /a*b/. 因此,/ a \\ 52b /等于/ a * b /。

The way I read this, val in -v var=val is not a string constant, and there is no text to indicate that the string constant escaping rules apply. 我读这个的方式, val in -v var=val 不是字符串常量,并且没有文本表明字符串常量转义规则适用。

My questions: 我的问题:

  1. Is there a more authoritative source for the awk language than the man page, and if so what does it specify? 是否有一个比man手册更权威的awk语言源,如果有的话,它指定了什么?
  2. What does POSIX have to say about this, if anything? 如果有的话,POSIX对此有什么看法?
  3. Do all versions of Awk behave this way, ie can I rely on the expansion being done if I actually want it? 所有版本的Awk都是这样做的,即如果我真的需要它,我可以依赖扩展吗?

The assignment is a string constant. 赋值是一个字符串常量。

The relevant sections from the standard are: 标准的相关部分是:

-v assignment The application shall ensure that the assignment argument is in the same form as an assignment operand. -v赋值应用程序应确保赋值参数与赋值操作数的形式相同。 The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). 指定的变量赋值应在执行awk程序之前发生,包括与BEGIN模式相关的操作(如果有的话)。 Multiple occurrences of this option can be specified. 可以指定多次出现此选项。

and

An operand that begins with an underscore or alphabetic character from the portable character set (see the table in XBD Portable Character Set ), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the '=' character, shall specify a variable assignment rather than a pathname. 一个操作数,以便携式字符集中的下划线或字母字符开头(参见XBD可移植字符集中的表),后跟可移植字符集中的下划线,数字和字母序列,后跟'='字符,应指定变量赋值而不是路径名。 The characters before the '=' represent the name of an awk variable; '='前面的字符代表awk变量的名称; if that name is an awk reserved word (see Grammar ) the behavior is undefined. 如果该名称是awk保留字(请参阅语法),则行为未定义。 The characters following the <equals-sign> shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote ( ' )' character, as a STRING token (see Grammar ), except that if the last character is an unescaped , it shall be interpreted as a literal rather than as the first character of the sequence "\\"" <equals-sign>后面的字符应被解释为好像它们出现在awk程序之前和之后是双引号(')'字符,作为STRING标记(参见语法),除非最后一个字符是如果没有转义,则应将其解释为字面而不是序列“\\”的第一个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM