简体   繁体   中英

I want to create a crawler using PHP script

I want to create a PHP script for a website. I just want to find out the links from that link. For example I have http://example.com link, my crawler should open that link in background and find all the links matching http://example.com/[any name]/reviews. I tried regex but not working, can anybody help me.

<?php
$url="https://clutch.co/it-services";
$contents =file_get_contents($url);
$pattern = "https://clutch.co/profile/".'/^[a-zA-Z ]*$/'."#review";
$pattern = preg_quote($pattern, '/');
if(preg_match_all($pattern, $contents, $matches)){
   echo "Found matches:\n";
   foreach ($matches[0] as $urls) {
    echo $urls;
  }
}
else{
   echo "No matches found";
}
?>

The regex pattern has some syntax issues:

the delimiters / need to be outside of the pattern and delimiters and special characters ( . ) inside that pattern ("https://") need to be exscaped ("https:\\/\\/")

So the pattern should be:

/https:\/\/clutch\.co\/profile\/[a-zA-Z ]*#review/

A regex fiddle: https://regex101.com/r/OEUQOU/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM