简体   繁体   中英

Python: regex findall

Iam using python regex to extract certain values from a given string. This is my string:

mystring.txt

sometext
somemore    text here

some  other text

              course: course1
Id              Name                marks
____________________________________________________
1               student1            65
2               student2            75
3               MyName              69
4               student4            43

              course: course2
Id              Name                marks
____________________________________________________
1               student1            84
2               student2            73
8               student7            99
4               student4            32

              course: course4
Id              Name                marks
____________________________________________________
1               student1            97
3               MyName              60
8               student6            82

and I need to extract the course name and corresponding marks for a particular student. For example, I need the course and marks for MyName from the above string.

I tried:

re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

But this works only if MyName is present under each course, but not if MyName is missing in some of the course, like in my example string.

Here I get output as: [('course1', '69'), ('course2', '60')]

but what actually what I want to achive is: [('course1', '69'), ('course4', '60')]

what would be the correct regex for this?

#!/usr/bin/python    
import re

buffer_fp = open("mystring.txt","r+")
buff = buffer_fp.read()
buffer_fp.close()
print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)
.*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?

                    ^^^^^^^^^^^^

You can try this.See demo.Just use a lookahead based quantifier which will search for MyName before a course just before it.

https://regex101.com/r/pG1kU1/26

I suspect this is impossible to do in a single regular expression. They are not all-powerful.

Even if you find a way, don't do this . Your non-working regex is already close to unreadable; a working solution is likely to be even more so. You can most likely do this in just a few lines of meaningful code. Pseudocode solution:

for line in buff:
    if it is a course line:
        set the course variable
    if it is a MyName line:
        add (course, marks) to the list of matches

Note that this could (and probably should) involve regexes in each of those if blocks. It's not a case of choosing between the hammer and the screwdriver to the exclusion of the other, but rather using them both for what they do best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM