简体   繁体   English

使用正则表达式向后搜索字符串(在Python中)?

[英]Search backward through a string using a regex (in Python)?

Context 语境
I'm parsing some code and want to match the doxygen comments before a function. 我正在解析一些代码,想在函数之前匹配doxygen注释。 However, because I want to match for a specific function name, getting only the immediately previous comment is giving me problems. 但是,由于我要匹配特定的函数名,因此仅获取前一个注释会给我带来麻烦。

Current Approach 当前方法

import re  
function_re = re.compile(
    r"\/\*\*(.+)\*\/\s*void\s+(\w+)\s*::\s*function_name\s*\(\s*\)\s*")  
function_match = function_re.search(file_string)
if function_match:  
    function_doc_str = update_match.group(2)

Problem with Current Approach 当前方法的问题
The current approach matches doxygen from earlier functions, giving me a result that is the wrong doxygen comment. 当前的方法匹配早期功能中的doxygen,给我的结果是错误的doxygen注释。

Question
Is there a way to search backward through a string using the Python Regex library? 有没有一种使用Python Regex库向后搜索字符串的方法?
It seems like my problem is that the more restrictive (less frequently occurring part) is the function signature, "void function()" 似乎我的问题是,函数签名(void function())的限制更为严格(较少出现)

Possible better question 可能更好的问题
Is there a better (easier) approach that I'm missing? 有没有更好的(更简便)的方法?

simplest way is to just use a group, you don't need to go backwards... 最简单的方法是只使用一个组,而不必倒退...

 (commentRegex)functionRegex

Then just extract group 1. You will need to run in multi-line mode to get it working, i don't know python so i can't be more helpful. 然后只提取第1组。您将需要在多行模式下运行才能使其正常工作,我不知道python,所以我再也无济于事了。

It's also possible with lookahead assertions, but this way is simpler. 前瞻性断言也可能,但是这种方式更简单。

I think you should use a regex that only matches doxymentation that's immediately before the function. 我认为您应该使用仅与该函数之前的doxymentation相匹配的正则表达式。 Maybe something like this (simplified example): 可能是这样的(简化示例):

import re

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

doxygenRegex = r"(?P<comment>/\*\*(?:[^/]|/(?!\*\*))*\*/)"
functionRegex = r"(?P<function>\s\w+\s+(?P<functionName>\w+)\s*\()"

match = re.search(doxygenRegex + functionRegex, test)
print match.groupdict()

As long as this matches something, you can loop the regex matching - but starting the search at test[match.end():] next time. 只要匹配,就可以循环进行正则表达式匹配-下次再从test[match.end():]开始搜索。 Hope that makes sense to you... 希望对你有意义...

BTW if you only want to extract the comment and nothing about the function, you can use lookahead - just replace functionRegex with r"(?=\\s\\w+\\s+\\w+\\s*\\()" . 顺便说一句,如果您只想提取注释而对函数没有任何了解,则可以使用超前-将functionRegex替换为r"(?=\\s\\w+\\s+\\w+\\s*\\()"

This can be achived using a single reg-ex. 可以使用单个正则表达式来实现。

The key is to capture the comment just before the desired function. 关键是在所需功能之前捕获注释。 The easy way to do this is to use non-greedy qualifier. 执行此操作的简单方法是使用非贪婪限定符。 For example: /\\*\\*(.*?)\\*/ with MULTILINE flag; 例如:/ /\\*\\*(.*?)\\*/ (.*?) /\\*\\*(.*?)\\*/带MULTILINE标志; however, in Python, non-greedy and MULTILINE do not work together (at least on my environment). 但是,在Python中,非贪婪和MULTILINE不能一起使用(至少在我的环境中)。 So, you need a little trick like this: 因此,您需要这样的一些技巧:

/\\*\\*((?:[^\\*]|\\*(?!/))*)\\*/ . /\\*\\*((?:[^\\*]|\\*(?!/))*)\\*/

This is to match: 这是为了匹配:

1: the comment begin /** . 1:评论开始/**

2: everything that is not * OR * that does not follows by / 2:所有不是* OR *内容都不能跟/

3: the comment end */ . 3:注释结尾*/

From this idea the code you want is: 根据这个想法,您想要的代码是:

function_name  = "function2"
regex_comment  = "/\*\*((?:[^\*]|\*(?!/))*)\*/"
regex_static   = "(?:(\w+)\s*::\s*)?"
regex_function = "(\w+)\s+"+regex_static+"(?:"+function_name+")\s*\([^\)]*\)"
regex = re.compile(regex_comment+"\s*"+regex_function, re.MULTILINE)
text  = """
/**
    @doxygen comment1
*/
void test::function1()
{
}

/**
    @doxygen comment2
*/
void test::function2()
{
}
"""
match = regex.search(text)
if (match == None): print "None"
else:               print match.group(1)

When run, you got: 运行时,您得到:


    @doxygen comment2

Variation: If you want to capture /** and */ too, use regex_comment = "(/\\*\\*(?:[^\\*]|\\*(?!/))*\\*/)" . 变体:如果您也想捕获/***/ ,请使用regex_comment = "(/\\*\\*(?:[^\\*]|\\*(?!/))*\\*/)"

Hope this helps. 希望这可以帮助。

Note that C isn't a regular language, so it cannot be parsed by regular expressions. 请注意,C不是常规语言,因此无法通过正则表达式进行解析。 Have you considered leveraging doxygen itself to parse this file? 您是否考虑过利用doxygen本身来解析此文件?

您可以使用(?<=...)(?<!...)进行后向断言,但通常只能匹配前向。

The question is why are these comments not inside the function, so you can use doc . 问题是为什么这些注释不在函数内,所以您可以使用doc

But there is no easy way with regex. 但是使用正则表达式没有简单的方法。

here's a non regex approach, split on */ and find if the function you are looking for is at the next item. 这是一种非正则表达式方法,在*/上分割,然后查找您要查找的函数是否在下一项中。 eg 例如

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

t=test.split("*/")
for n,comm in enumerate(t):
    try:
        if "void" in t[n+1]:
             print t[n]
    except IndexError: pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM