简体   繁体   中英

Regex format to find a specific string from the list

This is not for homework!

Hello,

Just a quick question about Regex formatting.

I have a list of different courses.

L = ['CI101', 'CS164', 'ENGL101', 'I-', 'III-', 'MATH116', 'PSY101']

I was looking for a format to find all the words that start with I , or II , or III . Here is what I did. (I used python fyi)

for course in L:
    if re.search("(I?II?III?)*", course):
        L.pop()

I learned that ? in regex means optional. So I was thinking of making I , II , and III optional and * to include whatever follows. However, it seems like it is not working as I intended. What would be a better working format?

Thanks

Here is the regex you should use:

^I{1,3}.*$

click here to see example

^ means the head of a line. I{1,3} means repeat I 1 to 3 times. .* means any other strings. $ means the tail of a line. So this regex will match all the words that start with I , II , or III .

Look at your regex, first, you don't have the ^ mark, so it will match I anywhere. Second, ? will only affect the previous one character, so the first I is optional, but the second I is not, then the third I is optional, the fourth and fifth I are not, the sixth I is optional. Finally, you use parentheses with * , that means the expression in parentheses will repeat many times include 0 time. So it will match 0 I , or at least 3 I .

your regex

Instead of search() you can use the function match() that matches the pattern at the beginning of string:

import re

l = ['CI101', 'CS164', 'ENGL101', 'I-', 'III-', 'MATH116', 'PSY101']

pattern = re.compile(r'I{1,3}')

[i for i in l if not pattern.match(i)]
# ['CI101', 'CS164', 'ENGL101', 'MATH116', 'PSY101']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM