简体   繁体   English

如何使用Python正则表达式匹配MATLAB的函数语法?

[英]How do I use a Python regex to match the function syntax of MATLAB?

I am trying to find all the inputs/outputs of all MATLAB functions in our internal library. 我试图在我们的内部库中找到所有MATLAB函数的所有输入/输出。 I am new (first time) to regex and have been trying to use the multiline mode in Python's re library. 我是regex的新手(第一次),一直在尝试在Python的re库中使用多行模式。

The MATLAB function syntax looks like: MATLAB函数语法如下所示:

function output = func_name(input)

where the signature can span multiple lines. 签名可以跨越多行。

I started with a pattern like: 我从一个像这样的模式开始:

re.compile(r"^.*function (.*)=(.*)\([.\n]*\)$", re.M)

but I keep getting an unsupported template operator error. 但我不断收到不受支持的模板运算符错误。 Any pointer is appreciated! 任何指针表示赞赏!

EDIT: 编辑:

Now I have: 我现在有:

pattern = re.compile(r"^\s*function (.*?)= [\w\n.]*?\(.*?\)", re.M|re.DOTALL)

which gives matches like: 给出类似的匹配项:

        function [fcst, spread] = ...
                VolFcstMKT(R,...
                           mktVol,...
                           calibrate,...
                           spread_init,...
                           fcstdays,...
                           tsperyear)

        if(calibrate)
            if(nargin < 6)
                tsperyear = 252;
            end
            templen = length(R)

My question is why does it give the extra lines instead of stopping at the first ) ? 我的问题是,为什么没有给出,而不是在第一次停止额外的行)

The peculiar (internal) error you're getting should come if you pass re.T instead of re.M as the second argument to re.compile ( re.template -- a currently undocumented entry -- is the one intended to use it, and, in brief, template REs don't support repetition or backtracking). 如果您通过re.T而不是re.M作为re.compile的第二个参数( re.template当前未记录的条目-是打算使用它的一个),则应该出现的特殊(内部)错误,简而言之,模板RE不支持重复或回溯。 Can you print re.M to show what's its value in your code before you call this re.compile ? 您可以在调用re.compile之前print re.M以在代码中显示其值吗?

Once that's fixed, we can discuss the details of your desired RE (in brief: if the input part can include parentheses you're out of luck, otherwise re.DOTALL and some rewriting of your pattern should help) -- but fixing this weird internal error occurrence seems to take priority. 解决之后,我们可以讨论所需的RE的详细信息(简而言之:如果input部分可以包含括号,那么您re.DOTALL走运了,否则re.DOTALL和对模式的一些重写会有所帮助)-但可以解决此问题内部错误的发生似乎是优先的。

Edit : with this bug diagnosed (as per the comments below this Q), moving on to the OP's current question: the re.DOTALL|re.MULTINE , plus the '$' at the end of the pattern, plus the everywhere-greedy matches (using .* , instead of .*? for non-greedy), all together ensure that if the regex matches it will match as broad a swathe as possible... that's exactly what this combo is asking for. 编辑 :诊断出该错误(根据该Q下面的注释),转到OP的当前问题: re.DOTALL|re.MULTINE ,在模式末尾加上“ $”,再加上到处贪婪匹配(使用.*代替非贪婪的.*? ),共同确保如果正则表达式匹配,则将尽可能广泛地匹配...这正是此组合所要求的。 Probably best to open another Q with a specific example: what's the input, what gets matched, what would you like the regex to match instead, etc. 最好用一个特定的例子来打开另一个Q:什么是输入,什么匹配,您希望正则表达式匹配什么,等等。

Here's a regular expression that should match any MATLAB function declaration at the start of an m-file: 这是一个正则表达式,应与m文件开头的任何MATLAB函数声明匹配:

^\s*function\s+((\[[\w\s,.]*\]|[\w]*)\s*=)?[\s.]*\w+(\([^)]*\))?

And here's a more detailed explanation of the components: 这是组件的更详细说明:

^\s*             # Match 0 or more whitespace characters
                 #    at the start
function         # Match the word function
\s+              # Match 1 or more whitespace characters
(                # Start grouping 1
 (               # Start grouping 2
  \[             # Match opening bracket
  [\w\s,.]*      # Match 0 or more letters, numbers,
                 #    whitespace, underscores, commas,
                 #    or periods...
  \]             # Match closing bracket
  |[\w]*         # ... or match 0 or more letters,
                 #    numbers, or underscores
 )               # End grouping 2
 \s*             # Match 0 or more whitespace characters
 =               # Match an equal sign
)?               # End grouping 1; Match it 0 or 1 times
[\s.]*           # Match 0 or more whitespace characters
                 #    or periods
\w+              # Match 1 or more letters, numbers, or
                 #    underscores
(                # Start grouping 3
 \(              # Match opening parenthesis
 [^)]*           # Match 0 or more characters that
                 #    aren't a closing parenthesis
 \)              # Match closing parenthesis
)?               # End grouping 3; Match it 0 or 1 times

Whether you use regular expressions or basic string operations, you should keep in mind the different forms that the function declaration can take in MATLAB. 无论您使用正则表达式还是基本字符串操作,都应牢记函数声明在MATLAB中可以采用的不同形式。 The general form is: 通用形式为:

function [out1,out2,...] = func_name(in1,in2,...)

Specifically, you could see any of the following forms: 具体来说,您可以看到以下任何形式:

function func_name                 %# No inputs or outputs
function func_name(in1)            %# 1 input
function func_name(in1,in2)        %# 2 inputs
function out1 = func_name          %# 1 output
function [out1] = func_name        %# Also 1 output
function [out1,out2] = func_name   %# 2 outputs
...

You can also have line continuations ( ... ) at many points, like after the equal sign or within the argument list: 您还可以在许多点处有换行符... ),例如等号后或参数列表内:

function out1 = ...
    func_name(in1,...
              in2,...
              in3)

You may also want to take into account factors like variable input argument lists and ignored input arguments : 您可能还需要考虑变量输入参数列表忽略的输入参数等因素

function func_name(varargin)       %# Any number of inputs possible
function func_name(in1,~,in3)      %# Second of three inputs is ignored

Of course, many m-files contain more than 1 function, so you will have to decide how to deal with subfunctions , nested functions , and potentially even anonymous functions (which have a different declaration syntax). 当然,许多m文件包含多个函数,因此您将必须决定如何处理子函数嵌套函数甚至可能是匿名函数 (它们具有不同的声明语法)。

how about normal Python string operations? 普通的Python字符串操作如何? Just an example only 仅是一个例子

for line in open("file"):
    sline=line.strip()
    if sline.startswith("function"):
       lhs,rhs =sline.split("=")
       out=lhs.replace("function ","")
       if "[" in out and "]" in out:
          out=out.replace("]","").replace("[","").split(",")
       print out
       m=rhs.find("(")
       if m!=-1:
          rhs=rhs[m:].replace(")","").replace("(","").split(",")           
       print rhs

output example 输出示例

$ cat file
function [mean,stdev] = stat(x)
n = length(x);
mean = sum(x)/n;
stdev = sqrt(sum((x-mean).^2/n));
function mean = avg(x,n)
mean = sum(x)/n;
$ python python.py
['mean', 'stdev ']
[' statx']
mean
[' avgx', 'n']

Of course, there should be many other scenarios of declaring functions in Matlab, like function nothing , function a = b etc , so add those checks yourself. 当然,在Matlab中应该有许多其他的声明函数的场景,例如function nothingfunction a = b等,因此请自己添加这些检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM