I have a text file that looks like this:
[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114
I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:
['fancybox-x.png', 'fancybox-y.png', 'blank.gif']
What I did was:
re.findall('\w+\.(jpg|gif|png)', f.read())
So the pattern is:
1 or more word-characters
(\\w+)
followed by a dot(\\.)
and then 'jpg', 'gif' or 'png'(jpg|gif|png)
.
This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1)
, so the result is:
['png', 'png', 'gif']
With is right, but incomplete. In other words, I'm asking, how can I make re.findall()
distinguish between "grouping" parentheses and parentheses to assign groups?
You're looking for non-capturing version of regular parentheses (?:...)
. The description is available in the re module docs .
s ='''[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114'''
import re
for m in re.findall('([-\w]+\.(?:jpg|gif|png))', s):
print m
You can just add another pair of parentheses, and put ?: for the inner one
re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())
Note that \\w
won't match "-", so I would suggest [^/]+
如果要查找整个匹配项,则应该能够在第0组中找到它,否则,如果要查找字符串的另一部分,则可以添加额外的括号。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.