简体   繁体   中英

Searching images files with regular expressions

I have a text file that looks like this:

[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114

I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:

['fancybox-x.png', 'fancybox-y.png', 'blank.gif']

What I did was:

re.findall('\w+\.(jpg|gif|png)', f.read())

So the pattern is:

1 or more word-characters (\\w+) followed by a dot (\\.) and then 'jpg', 'gif' or 'png' (jpg|gif|png) .

This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1) , so the result is:

['png', 'png', 'gif']

With is right, but incomplete. In other words, I'm asking, how can I make re.findall() distinguish between "grouping" parentheses and parentheses to assign groups?

You're looking for non-capturing version of regular parentheses (?:...) . The description is available in the re module docs .

s ='''[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114'''

import re

for m in re.findall('([-\w]+\.(?:jpg|gif|png))', s):
    print m

You can just add another pair of parentheses, and put ?: for the inner one

re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())

Note that \\w won't match "-", so I would suggest [^/]+

如果要查找整个匹配项,则应该能够在第0组中找到它,否则,如果要查找字符串的另一部分,则可以添加额外的括号。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM