The title is kinda unclear imo, but I couldnt find a better way to tell my concern. I am trying to get some pictures from Reddit. So when I tried to get the url to the image i got some problems.
$url = 'http://www.reddit.com/r/pics';
$str = file_get_contents($url);
This is what i currently have. To get the specific part in the url code where the image-url stands, I need to find this part of the html:
`<a class="thumbnail may-blank " href="http://i.imgur.com/K4q9i5c.jpg">`
As i was trying to figure out a way how to get each href of all the links on the page, I could only think about regex. Finding the part of
<a class="thumbnail may-blank "
and then find this > sign I could get the whole line. Where I eventually could get the url of the picture from.
So I have been trying and trying to find an regex to match is, I couldnt get it work. Maybe someone here can help me. Or either has a better solution.
It would be highly appreciated, Thanks
Shouldn't use regex to parse html, its really a bad choice.
But if you really have to, something like this might work.
(untested)
# '/(?s)<a\s+class\s*=\s*(["\'])(?:(?!\1|[<>]).)*\1\s+href\s*=\s*(["\'])((?:(?!\2|[<>]).)*)\2/'
(?s) # Dot-All
<a \s+ class \s* = \s* # class
( ["'] ) # (1), delimiter
(?:
(?! \1 | [<>] )
.
)*
\1 # delimiter
\s+
# [^<>]* ( add if necessary )
href \s* = \s* # href
( ["'] ) # (2), delimiter
( # (3 start), Url
(?:
(?! \2 | [<>] )
.
)*
) # (3 end)
\2 # delimiter
如果只需要a标签中的href,请尝试:
'<a.*href=\"(.*)\".*$'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.