简体   繁体   中英

how to crawl and download all pdf files from html link?

This is my code to crawl all pdf links but it doesn't work. How to download from those links and save to a folder on my computer?

<?php
set_time_limit(0);
include 'simple_html_dom.php';

$url = 'http://example.com';
$html = file_get_html($url) or die ('invalid url');

//extrack pdf links
foreach($html->find('a[href=[^"]*\.pdf]') as $element)
echo $element->href.'<br>';
?>
foreach($htnl->find('a[href=[^"]*\.pdf]') as element)
           ^---typo. should be an 'm'        ^---typo. need a $ here

How does your code "not work", other than because of above typo?

More simple solution here will be:

foreach ($html->find('a[href$=pdf]') as $element)

https://simplehtmldom.sourceforge.io/manual.htm

[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.

Have you looked into into phpquery? http://code.google.com/p/phpquery/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM