regexp isnt one of my strong skills so need a bit of your help on this, have this regexp to get pdf url on a site source code
if (preg_match("/http\:\/\/.*?\.pdf/i", $source)) {
which work ok most of the times but of example when I get sites with link urls like
http://doc.pdfsomething.com/somemore/name.pdf
I am getting as match http://doc.pdf and not the complete pdf url.
Any regexp guru, your help is appreciated.
You can try matching on a word boundary
preg_match("/http:\/\/.*?\.pdf\b/i", $source)
Meaning that .pdf
will only be matched if there is a non-word character after the pdf
such as "
, whitespace, etc..
Alternatively, if you know the URL is always going to be followed up with a specific character (double quotes "
?), then you could use
preg_match("/http:\/\/.*?\.pdf\"/i", $source)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.