简体   繁体   中英

How to find a substring using str.find or regex?

I am trying to process all lines containing /* Test number */ in a c++ file using python.

For example a.cpp:

int main(void)
{
    /* Test 1 */          //will be found, and replaced.
    int a =1;

    /* Test 2 */          //will be found, and replaced.
    int b = 2;

    return 0;
 }

In my python, I tried:

with open(fname, 'rw') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
            if(re.match('/\*\s[Test]\s\d+\*/', line):
                print line

But I got no print out at all. I am kind of new to regex expression, please give your suggestions.

I corrected your regex and the if statement syntax.

with open(fname, 'rw') as f:
for line_term in f:
    line = line_term.rstrip('\n')
        if re.match('\/\* Test \d+ \*\/', line):
            print line

re.match starts matching at the beginning of the string so you could start your pattern with matching one or more spaces.

You can omit the square brackets around [Test] because that means the characters are in a character class and that would match any of the listed characters which could also be written as [Ttes] .

Note that there is a space missing after matching the digits and using \\s will also match a newline which might be unwanted if you want to only match characters on the same line.

For clarity the spaces are between square brackets but they don't need to be.

[ ]+/\*[ ]Test[ ]\d+[ ]\*/

Regex demo

Your code could look like:

with open(fname, 'rw') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
        if(re.match(' +/\*\sTest \d+ \*/', line)):
            print (line)

use search() instead of match() because re.match() will only match at the beginning of the string, also you can use re.sub() to match and replace strings in one step:

with open(fname, 'r') as f:
    for line_term in f:
        line = line_term.rstrip('\n')
        if(re.search(r'/[*] Test \d+ [*]/', line)):
            print (line)

output:

    /* Test 1 */          //will be found, and replaced.
    /* Test 2 */          //will be found, and replaced.

Sounds like you've got the solution to your basic question from comments, but let's take a look at your regex so you can understand what the problem was.

Your regex:

\*\s[Test]\s\d+\*

It's looking good for the most part. You've escaped the * by adding \\ in front. You're using the \\s to match the space. That'll match any whitespace mind you, a tab or an enter, or whatever. If you just wanted space which it looks like you do, you can just put a space there(like this: /* Test */" ).

The main thing you've got wrong is [Test]. This is what's called a character class or a character set. This will match T or e or s or t. Just one of them. Not "Test". When you removed the character class brackets, you're left with "Test" which will match itself exactly. Character classes can be really useful though if you want to match something specific. If we want to match 1, 2, 3, 4, and T and c, or whatever, we could do this [1234Tc].

If you want it one or more times, [1234Tc]+
If you want it zero or more times, [1234Tc]*
If you want it to match between 2 and 5 times, [1234Tc]{2,5}
If you want it to match 4 times, [1234Tc]{4}

That last one would have worked for your character class. [Test]{4} would have matched your test. That said, it would have also matched "esTt"

Anyways, hopefully that's given you a better idea of what was going on there. It'll eventually click once you've learned all the rules. Happy regexing

 with open(fname, 'rw') as f: for line_term in f: line = line_term.rstrip('\\n') if re.match('.*\\/\\* Test \\d+ \\*\\/.*', line): print line; 

If you want to replace the found patterns with one specific thing, then you should use the re.sub method.

with open(fname, 'rw') as f:
    content = f.read()
    pattern = r'/\*\s*[Tt][Ee][Ss][Tt]\s*\d+.*?(?<=\*/)'
    replacement = str()
    print(re.sub(pattern, replacement, content))

Your input will be printed out without without the comments with the "test number".

Now lets have a look to the pattern itself:

/\\* -> the beginning of the comment

\\s*[Tt][Ee][Ss][Tt]\\s*\\d+ -> the test part with the number, and empty spaces around it

+.*?(?<=\\*/) -> everyting until the very first comment closing section

I do not recommend to replace all the line, because the line can contain an another multi line comment, what ends in an anouther line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM