简体   繁体   中英

Regular Expressions for matching functions in javascript source code?

Is there any way to match a function block in javascript source code using regular expressions?

(Really I'm trying to find the opposite of that, but I figured this would be a good place to start.)

I have a quite effective javascript solution, contrary to everyone elses belief... try this, i've used it and it works great function\\s*([A-z0-9]+)?\\s*\\((?:[^)(]+|\\((?:[^)(]+|\\([^)(]*\\))*\\))*\\)\\s*\\{(?:[^}{]+|\\{(?:[^}{]+|\\{[^}{]*\\})*\\})*\\}

https://regex101.com/r/zV2fO7/1

There are a certain things that regular expressions just aren't very good at. That doesn't mean it's impossible to build an expression that will work, just that it's probably not a good fit. Among those things:

  • multi-line input
  • nesting

Javascript function blocks tend to cover multiple lines, and you are going to want to find the matching "{" and "}" braces that signify the start and end of the block, which could be nested to an unknown depth. You also need to account for potential braces used inside comments. RegEx will be painful for this.

That doesn't mean it's impossible, though. You might have additional information about the nature of the functions you're looking for. If you can do things like guarantee no braces in comments and limit nesting to a specific depth, you could still build an expression to do it. It'll be somewhat messy and hard to maintain, but at least within the realm of the possible.

Not really, no.

Function blocks aren't regular and so regular expressions aren't the right tool for the job. See, in order to capture a function block in JS, you need to count instances of { and balance them against instances of } , otherwise you're going to match too much or too little. Regular expressions can't do this kind of counting.

Just read in the file you're trying to look at and manage the nesting recursively. It's conceptually very easy to manage this way.

No, it is not possible. Regexes can't match nested pairs of characters. So something like this would fool it:

function foo() {
    if(bar) {
        baz();
    } // oops, regex would think this was end of function
}

However, you could create a fairly simple grammar to do it (in EBNF-ish form):

javascript_func
: "function" ID "(" ")" "{" body* "}"
| "function" ID "(" params ")" "{" body* "}"
;

params
: ID
| params "," ID

body
: [^{}]* // assume this is like a regex
| "{" body* "}"
;

Oh, this is also assuming you have some kind of lexer to strip out whitespace and comments.

Some regex engines do allow recursion. Say in PHP or PCRE you could get nested brackets like so:

{(?:[^{}]+|(?R))*+}

?R "pastes" the entire expression in it's place. To capture functions subgroups will be more useful:

function[^{]+({(?:[^{}]+|(?-1))*+})

And then we might want to filter out any comments breaking the brackets (needs sm flags):

function\s+\w+\s*\([^{]+({(?:[^{}]+\/\*.*?\*\/|[^{}]+\/\/.*?$|[^{}]+|(?-1))*+})

This should work for basic cases. But then there's still strings with '}', string's with escaped quotes and other things to worry about.

Here's a demo: https://regex101.com/r/fG4gO1/2

After a day of fiddling with it for my own project, here is a regex that will break up a js file to match all named functions and then break it up into function name, arguments, and body.

function\s+(?<functionName>\w+)\s*\((?<functionArguments>(?:[^()]+)*)?\s*\)\s*(?<functionBody>{(?:[^{}]+|(?-1))*+})

https://regex101.com/r/sXrHLI/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM