简体   繁体   English

如何使用str.find或regex查找子字符串?

[英]How to find a substring using str.find or regex?

I am trying to process all lines containing /* Test number */ in a c++ file using python. 我正在尝试使用python处理包含/ * Test number * /在c ++文件中的所有行。

For example a.cpp: 例如a.cpp:

int main(void)
{
    /* Test 1 */          //will be found, and replaced.
    int a =1;

    /* Test 2 */          //will be found, and replaced.
    int b = 2;

    return 0;
 }

In my python, I tried: 在我的python中,我试过:

with open(fname, 'rw') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
            if(re.match('/\*\s[Test]\s\d+\*/', line):
                print line

But I got no print out at all. 但我根本没有打印出来。 I am kind of new to regex expression, please give your suggestions. 我是正则表达式的新手,请给出你的建议。

I corrected your regex and the if statement syntax. 我更正了你的正则表达式和if语句语法。

with open(fname, 'rw') as f:
for line_term in f:
    line = line_term.rstrip('\n')
        if re.match('\/\* Test \d+ \*\/', line):
            print line

re.match starts matching at the beginning of the string so you could start your pattern with matching one or more spaces. re.match在字符串的开头开始匹配,因此您可以使用匹配的一个或多个空格来启动模式。

You can omit the square brackets around [Test] because that means the characters are in a character class and that would match any of the listed characters which could also be written as [Ttes] . 您可以省略[Test]周围的方括号,因为这意味着字符位于字符类中 ,并且可以匹配任何列出的字符,这些字符也可以写为[Ttes]

Note that there is a space missing after matching the digits and using \\s will also match a newline which might be unwanted if you want to only match characters on the same line. 请注意,匹配数字后会丢失一个空格,并且如果您只想匹配同一行中的字符,则使用\\s也会匹配可能不需要的换行符。

For clarity the spaces are between square brackets but they don't need to be. 为清楚起见,空格位于方括号之间,但它们不需要。

[ ]+/\*[ ]Test[ ]\d+[ ]\*/

Regex demo 正则表达式演示

Your code could look like: 您的代码可能如下所示:

with open(fname, 'rw') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
        if(re.match(' +/\*\sTest \d+ \*/', line)):
            print (line)

use search() instead of match() because re.match() will only match at the beginning of the string, also you can use re.sub() to match and replace strings in one step: 使用search()而不是match()因为re.match()只会在字符串的开头匹配,也可以使用re.sub()在一个步骤中匹配和替换字符串:

with open(fname, 'r') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
        if(re.search(r'/[*] Test \d+ [*]/', line)):
            print (line)

output: 输出:

    /* Test 1 */          //will be found, and replaced.
    /* Test 2 */          //will be found, and replaced.

Sounds like you've got the solution to your basic question from comments, but let's take a look at your regex so you can understand what the problem was. 听起来你已经从评论中得到了基本问题的解决方案,但让我们来看看你的正则表达式,这样你就可以理解问题是什么了。

Your regex: 你的正则表达式:

\*\s[Test]\s\d+\*

It's looking good for the most part. 它在大多数情况下都很好看。 You've escaped the * by adding \\ in front. 你已经通过在前面添加\\来逃脱了*。 You're using the \\s to match the space. 你正在使用\\ s来匹配空间。 That'll match any whitespace mind you, a tab or an enter, or whatever. 这将匹配任何空白的心灵,一个标签或一个输入,或任何其他。 If you just wanted space which it looks like you do, you can just put a space there(like this: /* Test */" ). 如果你只是想要空间,你可以在那里放一个空格(如:/ * Test * /“)。

The main thing you've got wrong is [Test]. 你遇到的主要问题是[测试]。 This is what's called a character class or a character set. 这就是所谓的字符类或字符集。 This will match T or e or s or t. 这将匹配T或e或s或t。 Just one of them. 只是其中之一。 Not "Test". 不是“测试”。 When you removed the character class brackets, you're left with "Test" which will match itself exactly. 当您删除字符类括号时,您将保留“Test”,它将完全匹配。 Character classes can be really useful though if you want to match something specific. 如果你想匹配特定的东西,字符类可能非常有用。 If we want to match 1, 2, 3, 4, and T and c, or whatever, we could do this [1234Tc]. 如果我们想匹配1,2,3,4和T以及c,或者其他什么,我们可以这样做[1234Tc]。

If you want it one or more times, [1234Tc]+
If you want it zero or more times, [1234Tc]*
If you want it to match between 2 and 5 times, [1234Tc]{2,5}
If you want it to match 4 times, [1234Tc]{4}

That last one would have worked for your character class. 最后一个适用于你的角色类。 [Test]{4} would have matched your test. [测试] {4}会匹配您的测试。 That said, it would have also matched "esTt" 也就是说,它也会匹配“esTt”

Anyways, hopefully that's given you a better idea of what was going on there. 无论如何,希望这能让你更好地了解那里发生的事情。 It'll eventually click once you've learned all the rules. 一旦你学会了所有的规则,它最终会点击。 Happy regexing 快乐的复兴

 with open(fname, 'rw') as f: for line_term in f: line = line_term.rstrip('\\n') if re.match('.*\\/\\* Test \\d+ \\*\\/.*', line): print line; 

If you want to replace the found patterns with one specific thing, then you should use the re.sub method. 如果要用一个特定的东西替换找到的模式,那么你应该使用re.sub方法。

with open(fname, 'rw') as f:
    content = f.read()
    pattern = r'/\*\s*[Tt][Ee][Ss][Tt]\s*\d+.*?(?<=\*/)'
    replacement = str()
    print(re.sub(pattern, replacement, content))

Your input will be printed out without without the comments with the "test number". 您的输入将在没有“测试编号”注释的情况下打印出来。

Now lets have a look to the pattern itself: 现在让我们看一下模式本身:

/\\* -> the beginning of the comment /\\* - >评论的开头

\\s*[Tt][Ee][Ss][Tt]\\s*\\d+ -> the test part with the number, and empty spaces around it \\s*[Tt][Ee][Ss][Tt]\\s*\\d+ - >带有数字的测试部分,以及它周围的空白区域

+.*?(?<=\\*/) -> everyting until the very first comment closing section +.*?(?<=\\*/) - >一直到第一个评论结束部分

I do not recommend to replace all the line, because the line can contain an another multi line comment, what ends in an anouther line. 我不建议替换所有的行,因为该行可以包含另一个多行注释,以另一行结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM