简体   繁体   中英

Finding functions with regex?

In notepad++, there is a lot of helpful syntax highlighting for various programming languages, and I was wondering how it does some of it. I want to know how it can tell the different scopes of functions.

For example, how would it differentiate between the inside and the outside function?:

function myFunction(arguments){
    function functionInsideMyFunction(arguments){
        return 0;
    }
}

I'm sure it's very simple, but I'm new to regex and still have a bit of trouble understanding it.

Say, for example, I wanted a regex to only match functions that aren't in other functions. I would want to get myFunction, but not functionInsideMyFunction.

Does not Use RegEx, Uses something called a lexer. The lexer for Jedit is http://jflex.de lexers are confusing to me, but you can learn if you want. If you use java, you can futz with the internals of classes with the YourClass.class.whatev, and you can even manipulate that with http://Commons.apache.org/bcel . NotePad++ uses something similar. RegEx simply isn't expansive enough beyond basic line & String parsing

Regular expressions cover a set of functional languages which is on the lower end of 'expressiveness'... There are a lot of language constructs which you will not be able to solve/parse with regular expression, a good example here is the balanced-parentheses-problem .

What you're looking at with the function definitions above is basically the same thing, with the 'opening parenthesis' being function...(...){ and the 'closing parenthesis' being a simple } .

This problem is not solvable using regex, as it belongs to a set of languages of higher expressiveness (see also Chomsky Hierarchy ).

What you need to parse languages above (Chomsky-)level 3 (this is what regex can parse) is a proper parser . There are many different techniques/algorithms which are each suitable for languages of certain expressiveness. Explaining those here would probably be a bit of an overkill, if you want to really get into it I suggest reading about context free grammars , LR-parsers and LL-parsers (these are used a lot when parsing programming languages).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM