简体   繁体   中英

Whats wrong in my pattern string (Regular expression C#)

I'm met problem with string parsing and want solve her by regular expression. Always as input I'm get string the same like: %function_name%(IN: param1, ..., paramN; OUT: param1,..., paramN)

I'm wrote a pattern:

string pattern = @"[A-za-z][A-za-z0-9]*\\(IN:\\s*(([A-za-z][A-za-z0-9](,|;))+|;)\\s*OUT:(\\s*[A-za-z][A-za-z0-9],?)*\\)";

This pattern detected my input strings, but in fact as output I'm want to have a two arrays of strings. One of this must contain INPUT params (after "IN:") IN: param1, ..., paramN and second array must have names of output params. Params can contains numbers and '_'.

Few examples of real input strings:

Add_func(IN: port_0, in_port_1; OUT: out_port99)

Some_func(IN:;OUT: abc_P1)

Some_func2(IN: input_portA;OUT:)

Please, tell me how to make a correct pattern.

You can use this pattern, that allows to catch all functions with separate params in one shot:

(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*

Pattern details:

    (?<funcName>\w+)\(IN: ?  # capture the function name and match "(IN: "
  |                          # OR
    OUT: ?                   # match "OUT: "
  |                          # OR
    \G(?<inParam>[^,;()]+)?  # contiguous match, that captures a IN param
    (?=[^)(;]*;)             # check that it is always followed by ";"
    \s*[,;]\s*               # match "," or ";" (to be always contiguous)
  |                          # OR
    \G(?<outParam>[^,()]+)?  # contiguous match, that captures a OUT param 
    (?=[^;]*\s*\))           # check that it is always followed by ")"
    \s*[,)]\s*               # match "," (to be always contiguous) or ")"

(To obtain a cleaner result, you must walk to the match array (with a foreach) and remove empty entries)

example code:

static void Main(string[] args)
{
    string subject = @"Add_func(IN: port_0, in_port_1; OUT: out_port99)
        Some_func(IN:;OUT: abc_P1)
        shift_data(IN:po1_p0;OUT: po1_p1, po1_p2)
        Some_func2(IN: input_portA;OUT:)";
    string pattern = @"(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*";
    Match m = Regex.Match(subject, pattern);
    while (m.Success)
    {
        if (m.Groups["funcName"].ToString() != "")
        {
            Console.WriteLine("\nfunction name: " + m.Groups["funcName"]);
        }
        if (m.Groups["inParam"].ToString() != "")
        {
            Console.WriteLine("IN param: " + m.Groups["inParam"]);
        }
        if (m.Groups["outParam"].ToString() != "")
        {
            Console.WriteLine("OUT param: "+m.Groups["outParam"]);
        }
        m = m.NextMatch();
    }
}

An other way consists to match all IN parameters and all OUT parameters in one string and then to split these strings with \\s*,\\s*

example:

string pattern = @"(?<funcName>\w+)\(\s*IN:\s*(?<inParams>[^;]*?)\s*;\s*OUT\s*:\s*(?<outParams>[^)]*?)\s*\)";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
    string functionName = m.Groups["function name"].ToString();
    string[] inParams = Regex.Split(m.Groups["inParams"].ToString(), @"\s*,\s*");
    string[] outParams = Regex.Split(m.Groups["outParams"].ToString(), @"\s*,\s*");
    // Why not construct a "function" object to store all these values
    m = m.NextMatch();
}

The way to do this is with capturing groups. Named capturing groups are the easiest to work with:

// a regex surrounded by parens is a capturing group
// a regex surrounded by (?<name> ... ) is a named capturing group
// here I've tried to surround the relevant parts of the pattern with named groups
var pattern = @"[A-za-z][A-za-z0-9]*\(IN:\s*(((?<inValue>[A-za-z][A-za-z0-9])(,|;))+|;)\s*OUT:(\s*(?<outValue>[A-za-z][A-za-z0-9]),?)*\)";

// get all the matches. ExplicitCapture is just an optimization which tells the engine that it
// doesn't have to save state for non-named capturing groups
var matches = Regex.Matches(input: input, pattern: pattern, options: RegexOptions.ExplicitCapture)
    // convert from IEnumerable to IEnumerable<Match>
    .Cast<Match>()
     // for each match, select out the captured values
    .Select(m => new { 
        // m.Groups["inValue"] gets the named capturing group "inValue"
        // for groups that match multiple times in a single match (as in this case, we access
        // group.Captures, which records each capture of the group. .Cast converts to IEnumerable<T>,
        // at which point we can select out capture.Value, which is the actual captured text
        inValues = m.Groups["inValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray(),
        outValues = m.Groups["outValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray()
    })
    .ToArray();

I think this is what you are looking for:

[A-za-z][A-za-z0-9_]*\(IN:((?:\s*(?:[A-za-z][A-za-z0-9_]*(?:[,;])))+|;)\s*OUT:(\s*[A-za-z][A-za-z0-9_]*,?)*\)

There were a few problems with grouping as well as you were missing the space between multiple IN parameters. You also were not allowing for an underscore which appeared in your examples.

The above will work with all of your examples above.

Add_func(IN: port_0, in_port_1; OUT: out_port99) will capture:

  • port_0, in_port_1 and out_port99

Some_func(IN:;OUT: abc_P1) will capture:

  • ; and abc_P1

Some_func2(IN: input_portA; OUT:) will capture:

  • input_portA and empty.

After getting these capture groups, you can split them on commas to get your arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM