简体   繁体   English

TCL:用于在字符串中查找while的regexp

[英]TCL : regexp to find if while for in a string

I am trying to write a regexp to search for for/if/while keywords in a string as read from C++ source code file but exclude any words which include them like: 我正在尝试编写一个正则表达式来搜索从C ++源代码文件读取的字符串中的/ if / while关键字,但要排除包含它们的任何单词,例如:

WhatifYes()
Whatfor()
Waitforwhile()

I have written my regexp like below: 我写了我的正则表达式如下:

if { [ regexp {(for|while|if)(\s+)(\()} $lineValue ] } { 

But it is not picking up cases like: 但这并没有解决类似这样的情况:

while(( int x = 0 ) > 0 );
while(( int x = 0 ) > 0 )
for(int y =0 ; ; )
for(int y =0 ; ; );
if( (int x = 9) > 0 )
if( (int x = 9) > 0 );

Initially I thought cause my regexp is framed to be like: 最初我以为使我的正则表达式的框架像这样:

if/for/while \s+ ( #space or multiple spaces

But I tried including spaces in above example : 但是我尝试在上面的示例中包含空格:

while (( int x = 0 ) > 0 );
while (( int x = 0 ) > 0 )
if ( (int x = 9) > 0 )
if ( (int x = 9) > 0 );

Still the regexp is not working - please let me know what regexp I should use to capture them? regexp仍然无法正常工作-请让我知道应该使用什么regexp捕获它们?

Part of your problem is easy to address, and part is very hard. 您的问题的一部分很容易解决,而一部分则很难。

The easy part is ensuring that you've got a whole word: the \\m constraint escape only matches at the start of a word, and the \\M constraint escape matches at the end, so we can use: 最简单的部分是确保您有一个完整的单词: \\m约束转义仅在单词的开头匹配,而\\M约束转义在结尾的匹配,因此我们可以使用:

# Nothing capturing; you can add that as necessary
# Ellipsis for the bits I've not talked about yet
regexp {\m(?:while|if|for)\M\s*...} ...

The very hard part is matching the part in parentheses. 很难的部分是匹配括号中的部分。 The problem is that that's really a “language” (in a theoretical sense) that requires a different kind of parser than a regular expression to match (ie, a recursive descent parser, which has a more complex state model than the finite automatons used in RE matching). 问题是,这实际上是一种“语言”(从理论上来说),需要与正则表达式不同的解析器进行匹配(即,递归下降解析器,其状态模型比用于自动生成器的有限自动机更为复杂)。 RE匹配)。 What's more, using () characters in those expressions is common. 而且,在这些表达式中使用()字符很常见。 The easiest approach is instead match against a close parenthesis that's at the end of the line, possibly followed by a semicolon, but that's definitely not properly correct. 相反,最简单的方法是与行尾的圆括号匹配,可能后跟分号,但这绝对不正确。 Alternatively, supporting a limited number of levels of nested parens is also possible. 或者,也可以支持有限数量的嵌套paren级别。

# Match a few levels...
regexp {\m(?:while|if|for)\M\s*\((?:[^()]|\((?:[^()]|\([^()]*\))*\))*\)} ...

So, let's break that RE down: 因此,让我们分解一下RE:

\m                                Word start
(?:while|if|for)                  One of the keywords 
\M                                Word end
\s*                               Optional spaces
\(                                Open paren
  (?:                             Either...
    [^()]                           Non-paren...
  |                               Or...
    \(                              Open paren
      (?:                           Either...
        [^()]                         Non-paren...
      |                             Or...
        \(                            Open paren
          [^()]*                      Non-parens
        \)                            Close paren
      )*                            ... as many of the above as needed
    \)                              Close paren
  )*                              ... as many of the above as needed
\)                                Close paren

If you look at the above, you'll notice a pattern. 如果您查看以上内容,将会发现一种模式。 Yes, you can keep on nesting to do as deep as you want. 是的,您可以继续嵌套以完成所需的深度。 What you can't do is make the RE engine do that nesting for you. 不能做的就是让RE引擎为您完成嵌套。

In your regex you are using \\s+. 在您的正则表达式中,您使用的是\\ s +。 That means there must be at least one space/tab/line-break. 这意味着必须至少有一个空格/制表符/换行符。 Use \\s* (0 or more whitespace) and add logic for what comes before: 使用\\ s *(0或更多空格),并为之前的内容添加逻辑:

if { [ regexp {(^|[ \t])(for|while|if)(\s*)(\()} $lineValue ] } { 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM