简体   繁体   English

如何在C#中将正则表达式与带有括号的模式匹配

[英]How to Regex match a pattern with parentheses in C#

Background: I'm doing some complicated code generation that requires me to extract the methods within a C# interface file. 背景:我正在做一些复杂的代码生成,需要我在C#接口文件中提取方法。 I cannot simply use reflection because this code will feed a T4 template which will not have the compiled code to reflect upon. 我不能简单地使用反射,因为此代码将提供一个T4模板,而该模板将没有要反映的已编译代码。 Thus I am attempting parsing. 因此,我正在尝试解析。 I can easily make my own parser, but it would be nice if there was a regular expression solution. 我可以轻松地创建自己的解析器,但是如果有一个正则表达式解决方案会很好。

Question: Is-there/What regex pattern would match the method declarations (including the return types and parameters) of the string below using C#'s Regular Expressions library? 问题:使用C#的正则表达式库,是否/哪种正则表达式模式与下面的字符串的方法声明(包括返回类型和参数)匹配?

    string testing = @"
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;

    namespace ConsoleApplication1
    {
        public interface Service
        {
            int Test1(int a);

            int Test2(int a, int b);

            int Test3(
                int a,
                int b);

            int Test4(out int a);
        }
    }
    ";

The regex pattern I desire should make four matches: 我希望的正则表达式模式应该匹配四个:

  1. "int Test1(int a);" “ int Test1(int a);”
  2. "int Test2(int a, int b);" “ int Test2(int a,int b);”
  3. "int Test3( int a, int b);" “ int Test3(int a,int b);” [note: #3 would be multi-line] [注意:#3将是多行]
  4. "int Test4(out int a);" “ int Test4(out int a);”

Solution Attempt: Here is possibly the closest I have come to a regex solution thus far: 解决方案尝试:这可能是到目前为止我最接近的正则表达式解决方案:

string WhiteSpacePattern = @"\s+";
string PossibleWhiteSpacePattern = @"\s*";
string CsharpWordPattern = @"[a-zA-Z_]+";
string ParenthesesPattern = @"[(][\s\S]*?[)]";

string DoubleCsharpWordPattern = CsharpWordPattern + WhiteSpacePattern + CsharpWordPattern;
string MethodDeclarationPattern =
    DoubleCsharpWordPattern +
    PossibleWhiteSpacePattern +
    ParenthesesPattern;

Pattern usage example: 模式用法示例:

MatchCollection tests = Regex.Matches(testing, MethodDeclarationPattern);

The individual patterns work perfectly (CsharpWordPattern, ParenthesesPattern, WhiteSpacePattern, and PossibleWhiteSpacePattern). 各个模式可以完美地工作(CsharpWordPattern,括号模式,WhiteSpacePattern和可能的WhiteSpacePattern)。 However, when I put them altogether into a single pattern (MethodDeclarationPattern), the full pattern is failing. 但是,当我将它们完全放在一个模式(MethodDeclarationPattern)中时,完整模式失败了。

How does MethodDeclarationPattern or my usage example need to be altered so that it will start matching the method declarations in the interface code? 如何更改MethodDeclarationPattern或我的用法示例,以使其开始与接口代码中的方法声明匹配?

To match literal parens, escape them with backslashes: 要匹配文字括号,请使用反斜杠对其进行转义:

string ParenthesesPattern = @"\([\s\S]*?\)";

That regex snippet matches a matched pair of parentheses, with optional whitespace between them. 该正则表达式代码段匹配一对匹配的括号,并且括号之间有可选的空格。 You're putting it at the end of your overall regex. 您将其放在整个正则表达式的末尾。

Your complete concatenated regex looks like this: 您完整的串联正则表达式如下所示:

[a-zA-Z_]+\s+[a-zA-Z_]+\s*[(][\s\S]*?[)]

Identifier, space, identifier, open paren, space, close paren. 标识符,空格,标识符,打开括号,空格,关闭括号。

For that to match, the method declaration will have to look like this: 为此,方法声明必须如下所示:

"int foo ()"

I believe you'll have better success with something like this: 我相信您将通过以下方式获得更好的成功:

string openParenPattern = @"\([\s\S]*?";
string closeParenPattern = @"[\s\S]*?\)";

What you really need, conceptually, is this (leaving out space -- no need to clutter it up with that): 从概念上讲,您真正需要的是(节省空间-无需将其弄乱):

  1. identifier 识别码
  2. identifier 识别码
  3. open paren 开放式
  4. ((ref|out)? identifier identifier comma)* ((ref | out)?标识符标识符逗号)*
  5. ((ref|out)? identifier identifier)? (((ref | out)?标识符标识符)?
  6. close paren 近亲

You know all the syntax for that, I think. 我想,您知道所有语法。 You'll have nested groups. 您将有嵌套的组。 Looking at it, I'm really starting to warm up to your idea of putting sub-regexes in string variables and then concatenating them. 看着它,我真的开始热衷于您的想法,即将子正则表达式放入字符串变量中,然后将它们串联起来。

The following code matches all four method declarations in your test string: 以下代码匹配测试字符串中的所有四个方法声明:

//  This has one bug: It matches "int foo(int a,)"
//  Somebody good with regexes could fix that. 
var methodPattern =
    //  return type
        identPattern + spacePattern
    //  method name
    + identPattern + spacePattern
    //  open paren
    + openParenPattern + spacePattern
    //  Zero or more parameters followed by commas
    + "(" + paramPattern + spacePattern + "," + spacePattern + ")*" + spacePattern
    //  Final (or only) parameter not followed by a comma
    + "(" + paramPattern + spacePattern + ")?" + spacePattern
    //  Close paren
    + closeParenPattern;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM