pdf regexp match on php

Question

regexp isnt one of my strong skills so need a bit of your help on this, have this regexp to get pdf url on a site source code

if (preg_match("/http\:\/\/.*?\.pdf/i", $source)) {

which work ok most of the times but of example when I get sites with link urls like

I am getting as match http://doc.pdf and not the complete pdf url.

Any regexp guru, your help is appreciated.

Answer 1

You can try matching on a word boundary

preg_match("/http:\/\/.*?\.pdf\b/i", $source)

Meaning that .pdf will only be matched if there is a non-word character after the pdf such as " , whitespace, etc..

Alternatively, if you know the URL is always going to be followed up with a specific character (double quotes " ?), then you could use

preg_match("/http:\/\/.*?\.pdf\"/i", $source)