简体   繁体   中英

Function Parser with RegEx in Python

I have a source code in Fortran (almost irrelevant) and I want to parse the function names and arguments.

eg using

(\w+)\([^\(\)]+\)

with

a(b(1 + 2 * 2), c(3,4))

I get the following: (as expected)

b, 1 + 2 * 2
c, 3,4

where I would need

a, b(1 + 2 * 2), c(3,4)
b, 1 + 2 * 2
c, 3,4

Any suggestions?

Thanks for your time...

I don't think this is a job for regular expressions... they can't really handle nested patterns.

This is because regexes are compiled into FSMs (Finite State Machines). In order to parse arbitrarily nested expressions, you can't use a FSM, because you need infinitely many states to keep track of the arbitrary nesting. Also see this SO thread .

This is a nonlinear grammar -- you need to be able to recurse on a set of allowed rules. Look at pyparsing to do simple CFG (Context Free Grammar) parsing via readable specifications.

It's been a while since I've written out CFGs, and I'm probably rusty, so I'll refer you to the Python EBNF to get an idea of how you can construct one for a subset of a language syntax.

Edit: If the example will always be simple, you can code a small state machine class/function that iterates over the tokenized input string, as @Devin Jeanpierre suggests.

It can be done with regular expressions-- use them to tokenize the string, and work with the tokens. ie see re.Scanner . Alternatively, just use pyparsing.

你可以看一下PLY(Python Lex-Yacc) ,它(在我看来)使用起来非常简单并且有很好的文档记录,并附带一个计算器示例 ,这可能是一个很好的起点。

You can't do this with regular expression only. It's sort of recursive. You should match first the most external function and its arguments, print the name of the function, then do the same (match the function name, then its arguments) with all its arguments. Regex alone are not enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM