我想使用 PHP 脚本创建一个爬虫

Question

I want to create a PHP script for a website.我想为网站创建一个 PHP 脚本。 I just want to find out the links from that link.我只想从那个链接中找出链接。 For example I have http://example.com link, my crawler should open that link in background and find all the links matching http://example.com/[any name]/reviews.例如，我有http://example.com链接，我的爬虫应该在后台打开该链接并找到所有与http://example.com/[any name]/reviews 匹配的链接。 I tried regex but not working, can anybody help me.我试过正则表达式但不起作用，有人可以帮助我。

<?php
$url="https://clutch.co/it-services";
$contents =file_get_contents($url);
$pattern = "https://clutch.co/profile/".'/^[a-zA-Z ]*$/'."#review";
$pattern = preg_quote($pattern, '/');
if(preg_match_all($pattern, $contents, $matches)){
   echo "Found matches:\n";
   foreach ($matches[0] as $urls) {
    echo $urls;
  }
}
else{
   echo "No matches found";
}
?>

Answer 1

The regex pattern has some syntax issues:正则表达式模式有一些语法问题：

the delimiters / need to be outside of the pattern and delimiters and special characters ( . ) inside that pattern ("https://") need to be exscaped ("https:\\/\\/")分隔符/需要在模式之外，并且该模式（“https://”）内的分隔符和特殊字符（ . ）需要被转义（“https:\\/\\/”）

So the pattern should be:所以模式应该是：

/https:\/\/clutch\.co\/profile\/[a-zA-Z ]*#review/

A regex fiddle: https://regex101.com/r/OEUQOU/1正则表达式小提琴： https : //regex101.com/r/OEUQOU/1

我想使用 PHP 脚本创建一个爬虫

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-04-09 06:20:05

我想使用 PHP 脚本创建一个爬虫

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-04-09 06:20:05

解决方案1
0 已采纳 2019-04-09 06:20:05