简体   繁体   English

C中常数的正则表达式

[英]Regular expression for constants in C

I want to write regular expression for constants in C language. 我想用C语言编写常量的正则表达式。 So I tried this: 所以我尝试了这个:

Let

  • digit -> 0-9, 数字-> 0-9,
  • digit_oct -> 0-7, digit_oct-> 0-7,
  • digit_hex -> 0-9 | digit_hex-> 0-9 | af | af | AF 自动对焦

Then: 然后:

  • RE = digit+ U 0digit_oct+ U 0xdigit_hex+ RE = digit + U 0digit_oct + U 0xdigit_hex +

I want to know whether I have written correct RE Is there any other way of writing this? 我想知道我是否写了正确的RE。还有其他写方法吗?

There is another type of integer constants, namely integer character constants such as 'a' or '\\n' . 还有另一种整数常量,即诸如'a''\\n'类的整数字符常量 In C99 these are constants and their type is just int . 在C99中,这些是常量,它们的类型只是int

The best regular expressions for all these are found in the standard, section 6.4, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf 在标准的第6.4节http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf中可以找到所有这些最佳的正则表达式

The 'RE' makes sense if we interpret the 'U' as being similar to set union. 如果我们将“ U”解释为类似于集合并集,则“ RE”是有意义的。 However, it is more conventional to use a '|' 但是,更常见的是使用'|' symbol to denote alternatives. 符号表示替代品。

First, you are only dealing with integer constants, not with floating point or character or string constants, let alone more complex constants. 首先,您只处理整数常量,而不处理浮点数或字符或字符串常量,更不用说更复杂的常量了。

Second, you have omitted ' 0X ' as a valid hex prefix. 其次,您省略了“ 0X ”作为有效的十六进制前缀。

Third, you have omitted the various suffixes: U , L , LL , ULL (and their lower-case and mixed case synonyms and permutations). 第三,省略了各种后缀: ULLLULL (以及它们的小写和混合大小写的同义词和排列)。

Also, the C standard (§6.4.4.1) distinguishes between digits and non-zero digits in a decimal constant: 同样,C标准(第6.4.1.4.1节)在十进制常量中区分数字和非零数字:

decimal-constant:
    nonzero-digit
    decimal-constant digit

Any integer constant starting with a zero is an octal constant, never a decimal constant. 任何以零开头的整数常数都是八进制常数,绝不能是十进制常数。 In particular, writing 0 is writing an octal constant. 特别地,写0就是写一个八进制常量。

First, C does not support Unicode literals, so you can eliminate the last rule. 首先,C不支持Unicode文字,因此您可以消除最后一条规则。 You also only define integer literals, not floating-point literals and not string or character literals. 您还只能定义整数文字,不能定义浮点文字,也不能定义字符串或字符文字。 For the sake of my convenience I assume that that is what you intended. 为了方便起见,我认为那是您的意图。

INT    := OCTINT | DECINT | HEXINT
DECINT := [1-9] [0-9]* [uU]? [lL]? [lL]?
OCTINT := 0 [0-7]* [uU]? [lL]? [lL]?
HEXINT := 0x [0-9a-fA-F]+ [uU]? [lL]? [lL]?

These only describe the form of the literals, not any logic such as maximum values. 这些仅描述文字的形式,而不描述诸如最大值之类的任何逻辑。

From perl point of view I came up with the following regexp, after reading ISO C 2011: 从perl的角度来看,在阅读ISO C 2011之后,我想到了以下正则表达式:

my $I_CONSTANT = qr/^(?:(0[xX][a-fA-F0-9]+(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)             # Hexadecimal
                      |([1-9][0-9]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)                    # Decimal
                      |(0[0-7]*(?:[uU](?:ll|LL|[lL])?|(?:ll|LL|[lL])[uU]?)?)                        # Octal
                      |([uUL]?'(?:[^'\\\n]|\\(?:[\'\"\?\\abfnrtv]|[0-7]{1..3}|x[a-fA-F0-9]+))+')    # Character
                    )$/x;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM