简体   繁体   中英

Regex for matching content (function/namespace) between nested curly braces

I have read a lot of topics here about matching and capturing the string between curly braces in text, but didn't find an answer, for matching and capturing the content of the functions (specially in case there some logic inside). So hope this topic won't be a duplicate.

I need to match several things in code files (I have a lot of them, and all of them has similar structure, but different depth), like the one below.

Here are things I need to capture:

  1. Main class name

  2. Sub classes names

  3. Sub classes functions names

  4. Content of each function

I need first 3 to scan all our projects, to map where those files (and their functions) are in use.

The last one is needed to match it againist the list specific services (internal and external) that can be used in those functions.

Code sample:

namespace Myprogramm.BusinessLogic
{
    public static class Utils
    {
        public static class Services
        {
            public static int GetSomeIDBySomeName()
            {
                // call some webservice
            }

            public static void UpdateViews()
            {
                // send some request
            }

            public static void IncreaseViews(int views)
            {
                if (views < 1000)
                {
                    // execute SQL SP1
                }
                else
                {
                    // execute SQL SP2
                }
            }
        }

        public static class SomeApi
        {
            public int OpenSomeSession(int someId)
            {
                if (someId < 0)
                {
                    // do something...
                }
                else
                {
                    // do something else ...
                }
            }
        }
    }
}

What I'm attempting to do, is to read those files as text, and to match their content against some regular expressions to capture the things I need.

I'm new at regular expressions. So I didn't achieved a lot of success here. I can't figure out, how can I match and capture the content of sub classes, and then how can I do the same thing for the functions.

I tried to work with this one (in another task) to capture the content of the simple functions (with no logic inside):

/{([^}]*)}/

And with this (also in another task to get content of the main class/namespace):

/{([\s\S]*)}/

And I do understand, why this doesn't help me here in this task.

To be clear, first of all I need to capture this one (to get the main class name) and it's content:

public static class Utils {...}

*** this one I actually understand

Then those two (to capture sub classes names and their content):

1.

public static class Services {...}

2.

public static class SomeApi {...}

And then (just for the first sub class as an example):

1.

public static int GetSomeIDBySomeName() {...}

2.

public static void UpdateViews() {...}

3.

public static void IncreaseViews(int views) { if (views < 1000) {...} else {...} }

In Jeffrey Friedl's book Mastering Regular Expressions there's a suitable sample on page 436 .
How to match nested constructs is also explained at regular-expressions.info or weblogs.asp.net .

The example in the sources changed to braces would result in something like this:

{(?>[^{}]+|{(?<x>)|}(?<-x>))*(?(x)(?!))}

Where x corresponds to the nested depth. Test it at regexhero.net

  • (?> opens an atomic group
  • [^{}] matches a character, that is not a brace
  • {(?<x>) ads to depth
  • }(?<-x>) subtracts from depth/stack
  • (?(x)(?!)) ensures depth is zero before meeting final }

Reference - What does this regex mean

In general, nested something languages are in a different cathegory ( context free languages ) than the languages defined by regular expressions ( regular languages ). Regular languages have grammars that don't allow nesting, and are parsed efficiently with a deterministic or nondeterministic finite state automaton. Context free languages need at least a stack based automaton, that allows to somewhat store the level of parenthesis in some place (in this case the stack ) To be able to parse nested parenthesis expressions with a regexp, you need to convert those languages first and make them to appear almost like a context free language, but not so. Just put an upper bound to the level of parenthesis that you allow your language to parse and you'll have a regular language. Only then you can convert a context free language to a regular one.

With the extensions of some languages (like perl or python) make to regexp, there is some way to cope partially (but not generally) with this.

In your case, you have up to five levels of parenthesis (counting not only curly brackets, but plain parenthesis also). Your automata (and the regular expression that allows five levels to be parsed) will be complex, anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM