简体   繁体   English

我的模式字符串有什么问题(正则表达式C#)

[英]Whats wrong in my pattern string (Regular expression C#)

I'm met problem with string parsing and want solve her by regular expression. 我遇到了字符串解析问题,并想通过正则表达式解决她。 Always as input I'm get string the same like: %function_name%(IN: param1, ..., paramN; OUT: param1,..., paramN) 总是作为输入,我得到的字符串相同:%function_name%(IN:param1,...,paramN; OUT:param1,...,paramN)

I'm wrote a pattern: 我写了一个模式:

string pattern = @"[A-za-z][A-za-z0-9]*\\(IN:\\s*(([A-za-z][A-za-z0-9](,|;))+|;)\\s*OUT:(\\s*[A-za-z][A-za-z0-9],?)*\\)";

This pattern detected my input strings, but in fact as output I'm want to have a two arrays of strings. 此模式检测到我的输入字符串,但实际上作为输出,我希望有两个字符串数组。 One of this must contain INPUT params (after "IN:") IN: param1, ..., paramN and second array must have names of output params. 其中之一必须包含INPUT参数(在“ IN:”之后) IN: param1, ..., paramN和第二个数组必须具有输出参数的名称。 Params can contains numbers and '_'. 参数可以包含数字和“ _”。

Few examples of real input strings: 实际输入字符串的几个示例:

Add_func(IN: port_0, in_port_1; OUT: out_port99) Add_func(IN:port_0,in_port_1; OUT:out_port99)

Some_func(IN:;OUT: abc_P1) Some_func(IN:; OUT:abc_P1)

Some_func2(IN: input_portA;OUT:) Some_func2(IN:input_portA; OUT :)

Please, tell me how to make a correct pattern. 请告诉我如何制作正确的图案。

You can use this pattern, that allows to catch all functions with separate params in one shot: 您可以使用此模式,该模式允许一枪捕捉具有单独参数的所有功能:

(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*

Pattern details: 图案细节:

    (?<funcName>\w+)\(IN: ?  # capture the function name and match "(IN: "
  |                          # OR
    OUT: ?                   # match "OUT: "
  |                          # OR
    \G(?<inParam>[^,;()]+)?  # contiguous match, that captures a IN param
    (?=[^)(;]*;)             # check that it is always followed by ";"
    \s*[,;]\s*               # match "," or ";" (to be always contiguous)
  |                          # OR
    \G(?<outParam>[^,()]+)?  # contiguous match, that captures a OUT param 
    (?=[^;]*\s*\))           # check that it is always followed by ")"
    \s*[,)]\s*               # match "," (to be always contiguous) or ")"

(To obtain a cleaner result, you must walk to the match array (with a foreach) and remove empty entries) (要获得更清晰的结果,您必须步行到match数组(带有foreach并删除空条目))

example code: 示例代码:

static void Main(string[] args)
{
    string subject = @"Add_func(IN: port_0, in_port_1; OUT: out_port99)
        Some_func(IN:;OUT: abc_P1)
        shift_data(IN:po1_p0;OUT: po1_p1, po1_p2)
        Some_func2(IN: input_portA;OUT:)";
    string pattern = @"(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*";
    Match m = Regex.Match(subject, pattern);
    while (m.Success)
    {
        if (m.Groups["funcName"].ToString() != "")
        {
            Console.WriteLine("\nfunction name: " + m.Groups["funcName"]);
        }
        if (m.Groups["inParam"].ToString() != "")
        {
            Console.WriteLine("IN param: " + m.Groups["inParam"]);
        }
        if (m.Groups["outParam"].ToString() != "")
        {
            Console.WriteLine("OUT param: "+m.Groups["outParam"]);
        }
        m = m.NextMatch();
    }
}

An other way consists to match all IN parameters and all OUT parameters in one string and then to split these strings with \\s*,\\s* 另一种方法是将所有IN参数和所有OUT参数匹配在一个字符串中,然后用\\s*,\\s*拆分这些字符串

example: 例:

string pattern = @"(?<funcName>\w+)\(\s*IN:\s*(?<inParams>[^;]*?)\s*;\s*OUT\s*:\s*(?<outParams>[^)]*?)\s*\)";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
    string functionName = m.Groups["function name"].ToString();
    string[] inParams = Regex.Split(m.Groups["inParams"].ToString(), @"\s*,\s*");
    string[] outParams = Regex.Split(m.Groups["outParams"].ToString(), @"\s*,\s*");
    // Why not construct a "function" object to store all these values
    m = m.NextMatch();
}

The way to do this is with capturing groups. 实现此目的的方法是捕获组。 Named capturing groups are the easiest to work with: 命名捕获组最容易使用:

// a regex surrounded by parens is a capturing group
// a regex surrounded by (?<name> ... ) is a named capturing group
// here I've tried to surround the relevant parts of the pattern with named groups
var pattern = @"[A-za-z][A-za-z0-9]*\(IN:\s*(((?<inValue>[A-za-z][A-za-z0-9])(,|;))+|;)\s*OUT:(\s*(?<outValue>[A-za-z][A-za-z0-9]),?)*\)";

// get all the matches. ExplicitCapture is just an optimization which tells the engine that it
// doesn't have to save state for non-named capturing groups
var matches = Regex.Matches(input: input, pattern: pattern, options: RegexOptions.ExplicitCapture)
    // convert from IEnumerable to IEnumerable<Match>
    .Cast<Match>()
     // for each match, select out the captured values
    .Select(m => new { 
        // m.Groups["inValue"] gets the named capturing group "inValue"
        // for groups that match multiple times in a single match (as in this case, we access
        // group.Captures, which records each capture of the group. .Cast converts to IEnumerable<T>,
        // at which point we can select out capture.Value, which is the actual captured text
        inValues = m.Groups["inValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray(),
        outValues = m.Groups["outValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray()
    })
    .ToArray();

I think this is what you are looking for: 我认为这是您要寻找的:

[A-za-z][A-za-z0-9_]*\(IN:((?:\s*(?:[A-za-z][A-za-z0-9_]*(?:[,;])))+|;)\s*OUT:(\s*[A-za-z][A-za-z0-9_]*,?)*\)

There were a few problems with grouping as well as you were missing the space between multiple IN parameters. 分组存在一些问题,并且您丢失了多个IN参数之间的空格。 You also were not allowing for an underscore which appeared in your examples. 您也不允许在示例中出现下划线。

The above will work with all of your examples above. 上面的代码适用于您上面的所有示例。

Add_func(IN: port_0, in_port_1; OUT: out_port99) will capture: Add_func(IN: port_0, in_port_1; OUT: out_port99)将捕获:

  • port_0, in_port_1 and out_port99 port_0, in_port_1out_port99

Some_func(IN:;OUT: abc_P1) will capture: Some_func(IN:;OUT: abc_P1)将捕获:

  • ; and abc_P1 abc_P1

Some_func2(IN: input_portA; OUT:) will capture: Some_func2(IN: input_portA; OUT:)将捕获:

  • input_portA and empty. input_portA并为空。

After getting these capture groups, you can split them on commas to get your arrays. 获取这些捕获组后,可以将它们分割为逗号以获取阵列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM