简体   繁体   中英

Python RegEx capturing first word after pattern

Possible strings are:

  1. public class MyClass extends ParentClass {

or

  1. public class MyClass throws SomeException {

or just

  1. public class MyClass {

I am using the following pattern to always capture MyClass :

ptrn = "((public|private|protected)\s+(.*)\s*[class|interface]\s+(\w+))"

But when I do

regex = re.search(ptrn, text)

className = regex.group(4) 

for 1 and 2 I get ParentClass and SomeException respectively and only for 3 I get MyClass .

What is wrong with my regex pattern and how do I fix it?

I don't know Python, but I do know regex fairly well. What you are looking for is something more like: (public|private|protected)\\s+(class|interface)\\s+(\\w+)

I don't know which group that would be in Python, but it most other languages, it'd be group 3 (0 would be the whole string, 1 would be public, private or protected, 2 would be class or interface, 3 would be your class name.)

[class|interface] is a character class; essentially it will match any one of these characters. Instead, you probably want to use (class|interface)

http://rubular.com/r/Jc6o3SAhi3

This works:

strings = ("public class MyClass extends ParentClass {","public class MyClass throws SomeException {","public class MyClass {")
pattern = "((public|private|protected)\s+(class|interface)\s+(\w+))"

for string in strings:
    print re.search(pattern,string).group(4)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM