Searching images files with regular expressions

Question

I have a text file that looks like this:

[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114

I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:

['fancybox-x.png', 'fancybox-y.png', 'blank.gif']

What I did was:

re.findall('\w+\.(jpg|gif|png)', f.read())

So the pattern is:

1 or more word-characters (\\w+) followed by a dot (\\.) and then 'jpg', 'gif' or 'png' (jpg|gif|png) .

This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1) , so the result is:

['png', 'png', 'gif']

With is right, but incomplete. In other words, I'm asking, how can I make re.findall() distinguish between "grouping" parentheses and parentheses to assign groups?

Answer 1

You're looking for non-capturing version of regular parentheses (?:...) . The description is available in the re module docs .

s ='''[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114'''

import re

for m in re.findall('([-\w]+\.(?:jpg|gif|png))', s):
    print m

Answer 2

You can just add another pair of parentheses, and put ?: for the inner one

re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())

Note that \\w won't match "-", so I would suggest [^/]+

Answer 3

如果要查找整个匹配项，则应该能够在第0组中找到它，否则，如果要查找字符串的另一部分，则可以添加额外的括号。

Searching images files with regular expressions

Question

3 answers

solution1
3 ACCPTED 2011-11-23 00:58:26

solution2
3 2011-11-23 01:00:14

solution3
0 2011-11-23 00:57:05

Searching images files with regular expressions

Question

3 answers

solution1 3 ACCPTED 2011-11-23 00:58:26

solution2 3 2011-11-23 01:00:14

solution3 0 2011-11-23 00:57:05

solution1
3 ACCPTED 2011-11-23 00:58:26

solution2
3 2011-11-23 01:00:14

solution3
0 2011-11-23 00:57:05