简体   繁体   English

使用相对链接刮擦整个网站

[英]Scrape entire site with relative links

I am currently working a php-script based on Symfony Dom Crawler and Goutte . 我目前正在使用基于Symfony Dom CrawlerGoutte的php脚本。 They offer a fairly good possibility to scrape tags, selectors, but is it some easy good way to scrape the entire site and append full link to all links in the source-code? 它们提供了刮擦标签,选择器的相当不错的可能性,但是这是刮擦整个网站并将完整链接附加到源代码中所有链接的一种简便方法吗?

When i make a instance of my crawl-class i specify the page, and just want to append that link in front of all the local links on the page. 当我创建我的爬网类实例时,我指定了页面,只想将该链接附加到页面上所有本地链接的前面。 Any ideas? 有任何想法吗?

Are you tied to PHP? 您是否绑定到PHP? If not, you could use Zillabyte's domain_crawler component from the shell: 如果没有,则可以从外壳程序使用Zillabyte的domain_crawler组件:

$ zillabyte execute domain_crawl "example.com" --output_file some_file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM