简体   繁体   English

使用 RegEx 进行复杂的搜索和替换

[英]Complicated Search and Replace using RegEx

I'm trying to convert a bunch of custom "recipes" from an old proprietary format to something that is ultimately compatible with C#.我正在尝试将一堆自定义“食谱”从旧的专有格式转换为最终与 C# 兼容的格式。 And I think that the easiest way to do this would be to use regular expressions.我认为最简单的方法是使用正则表达式。 But I'm having trouble figuring out the expression.但是我在弄清楚表达式时遇到了麻烦。 The piece that I need to convert with this RegEx is the IF statements.我需要用这个 RegEx 转换的部分是 IF 语句。 Here are a few examples of the original recipes...这里有一些原始食谱的例子......

  • IF(A = B,C,D)如果(A = B,C,D)
  • IF(AA = BB,IF(E=F,G,H),DD) IF(AA = BB,IF(E=F,G,H),DD)
  • IF(S1<>R1,ROUND(ROUND(S2/S1,R2)*S3,R3),R4) IF(S1<>R1,ROUND(ROUND(S2/S1,R2)*S3,R3),R4)

The first one is straightforward... If A = B then C else D.第一个很简单……如果 A = B 那么 C 否则 D。
The second one is similar, except that the IF statements are nested.第二个类似,只是 IF 语句是嵌套的。
And the third one includes additional ROIND function calls in the results.第三个在结果中包含额外的 ROIND 函数调用。

I've stumbled across regex101.com and have managed to put together the following pattern which is getting close.我偶然发现了regex101.com并设法将以下接近的模式组合在一起。 It works for the first example, but not for the other two: (.*?)IF[^\\S\\r\\n]*\\((.*?),(.*?),(.*?)\\)它适用于第一个示例,但不适用于其他两个示例: (.*?)IF[^\\S\\r\\n]*\\((.*?),(.*?),(.*?)\\)

Ultimately, what I want to do is use a regular expression to turn the three examples above into:最终,我想要做的是使用正则表达式将上面的三个例子变成:

  • if (A == B) { C } else { D } if (A == B) { C } else { D }
  • if (AA == BB) { if (E == F) { G } else { H } } else { DD } if (AA == BB) { if (E == F) { G } else { H } } else { DD }
  • if (S1 <> R1) { ROUND(ROUND(S2/S1,R2)*S3,R3) } else { R4 } if (S1 <> R1) { ROUND(ROUND(S2/S1,R2)*S3,R3) } else { R4 }

Note that the whitespace in the results is not particularly important.请注意,结果中的空格并不是特别重要。 I just formatted it for readability.我只是为了可读性对其进行了格式化。 Also, the "ROUND" functions will be replaced separately with C# Math.Round() calls.此外,“ROUND”函数将分别替换为 C# Math.Round()调用。 No need to worry about those, here.不用担心这些,在这里。 (All I should need to do to them is add, "Math." and fix the capitalization.) (我需要对它们做的就是添加“数学”并修复大写。)

I'll keep plugging away at this, but if anyone out there has the RegEx experience to figure this out, I would appreciate it.我会继续努力解决这个问题,但如果有人有 RegEx 经验来解决这个问题,我将不胜感激。

EDIT: With some additional effort, I've expounded upon my first expression and got it into the following... (.*?)IF[^\\S\\r\\n]*\\((.*?),(([^\\(]*)|(.*?\\(.*?\\))),(([^\\(]*)|(.*?\\(.*?\\)))\\) And with the following replace expression... $1if($2) {$3} else {$6} I'm almost there. It's just the nested IF statements that are left. (And although I'd prefer to do this with a single pass, if a recursive expression is not going to work, I could rig something up to run the results of the expression through it a second time to deal with the nested IF statements. It's not ideal, but if it's the best I have, I could live with it.编辑:通过一些额外的努力,我已经阐述了我的第一个表达式并将其放入以下... (.*?)IF[^\\S\\r\\n]*\\((.*?),(([^\\(]*)|(.*?\\(.*?\\))),(([^\\(]*)|(.*?\\(.*?\\)))\\)跟随替换表达式... $1if($2) {$3} else {$6}$1if($2) {$3} else {$6} 。剩下的只是嵌套的 IF 语句。(虽然我更喜欢单次执行此操作,如果递归表达式不起作用,我可以通过它第二次运行表达式的结果来处理嵌套的 IF 语句.这并不理想,但如果它是我拥有的最好的,我可以忍受它。

The problem with using regex for parsing arbitrary recursive grammar, is that regex are not particularly suitable for recursion.使用正则表达式解析任意递归语法的问题是正则表达式不是特别适合递归。 There is a limited support for recursion in some regex implementation, but it's tricky to make it work for anything slightly more complicated than simple balanced parentheses.在某些正则表达式实现中对递归的支持有限,但要让它适用于比简单的平衡括号稍微复杂的任何东西是很棘手的。

That being said, for your particular case, although at the first sight it appears as recursive grammar, it might be possible to cheat.话虽如此,对于您的特定情况,虽然乍一看它是递归语法,但可能会作弊。

In IF(S1<>R1,ROUND(ROUND(S2/S1,R2)*S3,R3),R4)IF(S1<>R1,ROUND(ROUND(S2/S1,R2)*S3,R3),R4)

if it is guaranteed that both S1<>R1 and R4 don't contain comma symbol, then you can use the following regex:如果保证S1<>R1R4都不包含逗号符号,则可以使用以下正则表达式:

IF\(([^,]*),(.*),([^,]+)\)

Try it here: https://regexr.com/67r56在这里试试: https : //regexr.com/67r56

How it works: the first matching group greedily matches everything from the beginning of the string, until it encounters the first comma, then the second group greedily matches everything to the end, and starts backtracking, until the very last comma of the string is "released" from the second group.工作原理:第一个匹配组贪婪地匹配从字符串开头的所有内容,直到遇到第一个逗号,然后第二个组贪婪地匹配所有内容到结尾,并开始回溯,直到字符串的最后一个逗号是“从第二组中释放”。 After that the third group matches the "released tail" of the string.之后,第三组匹配字符串的“释放尾部”。


However, as I mentioned in the comments, if S1 , R1 or R4 are expressions themself, this regex trick won't work, and you'd need to use a proper recursive parser.但是,正如我在评论中提到的,如果S1R1R4本身就是表达式,则此正则表达式技巧将不起作用,您需要使用适当的递归解析器。 Fortunately, there are plenty of parser/combinator libraries for user defined grammars (or you might even find one that already works for your grammar).幸运的是,有很多用于用户定义语法的解析器/组合器库(或者您甚至可能会找到一个已经适用于您的语法的库)。 When your expression is parsed into AST, it's fairly easy to transform it into the desired form.当您的表达式被解析为 AST 时,很容易将其转换为所需的形式。

Alternatively, you can look into writing your own simple parser .或者,您可以考虑编写自己的简单解析器 It should be fairly straightforward, as you only care about nested parentheses and commas.它应该相当简单,因为您只关心嵌套的括号和逗号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM