简体   繁体   English

C# 正则表达式搜索点函数及其参数

[英]C# Regex search for dotted functions and their arguments

I need to search for all occurrences of chained C#-like functions in a text string.我需要在文本字符串中搜索所有出现的类似 C# 的链式函数。 For example, I would like to break out each method and its parenthetical arguments of a string such as this:例如,我想打破每个方法及其字符串的括号参数,例如:

object.method(1, "2", abc).method2().method3("test(), 1, 2, 3").method4("\"Hi\"")

Here is the regex pattern I almost had working:这是我几乎使用过的正则表达式模式:

(?<objectName>[^\}]*?)\.(?<methodName>[^\}]*?)\(((?:[^;"']|"[^"]*"|'[^']*')+)*?\)

This extracts the objectName and the first methodName correctly, but lumps这将正确提取 objectName 和第一个 methodName,但块

1, "2", abc).method2().method3("test, 1, 2, 3").method4("\\"Hi\\"" 1, "2", abc).method2().method3("test, 1, 2, 3").method4("\\"Hi\\""

all into the third argument as "$1".全部进入第三个参数“$ 1”。

My latest approach was to divide and conquer by removing the objectName specification as that is easy to parse out.我的最新方法是通过删除 objectName 规范来分而治之,因为它很容易解析。 This lead me to using:这导致我使用:

\.(?<methodName>[^(]*?)\(((?:[^;"']|"[^"]*"|'[^']*')+)*?\)

Which yields similar results as before obviously without the objectName.显然没有objectName,这会产生与以前相似的结果。 I did this to see if I could get a global result but could get the right regex syntax.我这样做是为了看看我是否可以获得全局结果但可以获得正确的正则表达式语法。

In summary, I need to parse out multiple chained .method(parameters) occurrences into their constituent parts named "methodName" and "parameters".总之,我需要将多个链接的 .method(parameters) 事件解析为名为“methodName”和“parameters”的组成部分。 I have deduced a few things but my regex skills are quite rusty at best and am unable to overcome this at this time.我已经推断出一些东西,但我的正则表达式技能充其量只是生疏,目前无法克服这一点。 I appreciate any help you may have to offer.我感谢您提供的任何帮助。

I have been using this site for testing: http://regexstorm.net/tester我一直在使用这个站点进行测试: http : //regexstorm.net/tester

UPDATE: To clarify, the requirements do not include supporting C# lambda expressions, only the dotted function syntax.更新:澄清一下,要求不包括支持 C# lambda 表达式,仅包括点函数语法。 This is not intended to be a full C# parser.这不是一个完整的 C# 解析器。 The only need is the dotted method chaining.唯一需要的是虚线方法链接。 I apologize for any confusion.对于任何混淆,我深表歉意。 The pattern I was looking to breakout is:我希望突破的模式是:

object.method(arguments).method(arguments).method(arguments)...

My approach to this was to first extract the object name which is a simple operation that does not require the use of Regex.我的方法是首先提取对象名称,这是一个不需要使用正则表达式的简单操作。 This would now leave the following for Regex parsing into two constituent parts:现在,将以下内容留给 Regex 解析为两个组成部分:

.method(arguments).method(arguments).method(arguments)...

Which would yield:这将产生:

method   arguments
method   arguments
method   arguments
...

arguments may be null (missing), as in .method(), or method may actually be a property (no parentheses and arguments), as in:参数可能为空(缺失),如在 .method() 中,或者方法实际上可能是一个属性(没有括号和参数),如:

.method.method().method(arguments)

Which would yield:这将产生:

method   (null)
method   (string.Empty)
method   arguments

arguments would contain everything between the opening and closing parentheses;参数将包含左括号和右括号之间的所有内容; these do not need to be parsed out at this time as those would be processed in a subsequent Regex operation.此时不需要解析这些,因为它们将在后续的 Regex 操作中处理。

This seems to me to be within the capability of Regex to detect this simple pattern of dot-method-openPar-argumentsStr-closePar next dot-method-openPar-argumentsStr-closePar and so forth.在我看来,Regex 有能力检测这种简单的 dot-method-openPar-argumentsStr-closePar 模式,下一个 dot-method-openPar-argumentsStr-closePar 等等。

This is the extent of the grammar - no comments, no lambda - just object.method(arguments).method()...这是语法的范围 - 没有注释,没有 lambda - 只是 object.method(arguments).method()...

I hope this helps.我希望这有帮助。

This can't be properly done through regex, because your arguments is just too unpredictable, and regex grammar level is uncomparable with C# parser grammar.这不能通过正则表达式正确完成,因为您的参数太不可预测了,并且正则表达式语法级别与 C# 解析器语法无法比拟。 For example, it can contain string with any content:例如,它可以包含具有任何内容的字符串:

method1("x.hiThere().lol()").method2()

it can nest:它可以嵌套:

method1(x=>method2().method3())

it can just do this:它可以做到这一点:

a("b().c()",d=> d(").hi()"))

For you problem solution you need to learn about Grammars , and write C# grammar for this particular task.对于您的问题解决方案,您需要了解Grammars ,并为此特定任务编写 C# 语法。 In terms of frameworks you can start from ANTLR project.在框架方面,您可以从ANTLR项目开始。

Explanation解释

The reason because you can't do this is grammar type differences.你不能这样做的原因是语法类型差异。 Regex is using regular language and is Type-3 in Chomsky hierarchy.正则表达式使用常规语言并且是乔姆斯基层次结构中的Type-3 C# is using context-free language and is Type-2 in Chomsky hierarchy. C# 使用上下文无关语言并且是 Chomsky 层次结构中的Type-2

If you represent it visualy, C# is much more powerful language than Regex language:如果你用视觉来表示它,C# 是比 Regex 语言更强大的语言:

在此处输入图片说明

For example, your case fall into pit of parsers is just because of lambda's in C#:例如,您的情况陷入解析器的坑只是因为 C# 中的 lambda:

method1(x=>
{
    ....
    /* some code here */
    ....
}).method2()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM