简体   繁体   中英

Python re pattern matching

I am trying to solve a problem of regex identification using re module. I would like to copy some lines beginning with * from a file, the exact line pattern is:

*7  3   279 0

and among the characters there are tabs. My regex to match with the lines is:

regex=re.compile(r'^\*\d+.\n', re.MULTILINE)
for line in f:
    if regexp.match(line)
    print >> a, line

The script I wrote create the file 'a' but it is empty, it cannot recognise the pattern. Have you got some advices?

Moreover, could you explain me the difference between a pattern in double quote and insingle quote? I searched in several python manual but I did not find any info.

You're not capturing the totality of the line with your regex, You'd only be matching lines of type:

*7

Something like ^\\*(?:\\d+\\s+)+$ should work, no need for multiline since you're applying the regex to each line of the file.

Edit: Changed to a non-capturing group, since it's not needed.

Assuming you are ONLY looking for * +number at the beginning of a line, you only need to do this:

regex=re.compile(r'\*\d+')
for line in f:
    if regexp.match(line)
    print >> a, line

If you care the number of numbers found delimited by spaces:

regex=re.compile(r'\*(?:\d+\s+){3}\d+')
for line in f:
    if regexp.match(line)
    print >> a, line

If you use re.match you don't need the ^ anchor. If you use re.search , you do. See the docs

试试这个:

 re.compile(r'^\*\d\s+\d+\s+')

不知道python,但正则表达式似乎应该是^[*][\\d(\\s)*]+$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM