how to crawl and download all pdf files from html link?

Question

This is my code to crawl all pdf links but it doesn't work. How to download from those links and save to a folder on my computer?

<?php
set_time_limit(0);
include 'simple_html_dom.php';

$url = 'http://example.com';
$html = file_get_html($url) or die ('invalid url');

//extrack pdf links
foreach($html->find('a[href=[^"]*\.pdf]') as $element)
echo $element->href.'<br>';
?>

Answer 1

foreach($htnl->find('a[href=[^"]*\.pdf]') as element)
           ^---typo. should be an 'm'        ^---typo. need a $ here

How does your code "not work", other than because of above typo?

Answer 2

More simple solution here will be:

foreach ($html->find('a[href$=pdf]') as $element)

https://simplehtmldom.sourceforge.io/manual.htm

[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.

Answer 3

Have you looked into into phpquery? http://code.google.com/p/phpquery/

how to crawl and download all pdf files from html link?

Question

3 answers

solution1
2 2012-02-01 22:05:25

solution2
0 2021-03-24 07:03:32

solution3
0 2012-02-01 23:35:11

how to crawl and download all pdf files from html link?

Question

3 answers

solution1 2 2012-02-01 22:05:25

solution2 0 2021-03-24 07:03:32

solution3 0 2012-02-01 23:35:11

solution1
2 2012-02-01 22:05:25

solution2
0 2021-03-24 07:03:32

solution3
0 2012-02-01 23:35:11