[英]I want to create a crawler using PHP script
I want to create a PHP script for a website.我想为网站创建一个 PHP 脚本。 I just want to find out the links from that link.
我只想从那个链接中找出链接。 For example I have http://example.com link, my crawler should open that link in background and find all the links matching http://example.com/[any name]/reviews.
例如,我有http://example.com链接,我的爬虫应该在后台打开该链接并找到所有与http://example.com/[any name]/reviews 匹配的链接。 I tried regex but not working, can anybody help me.
我试过正则表达式但不起作用,有人可以帮助我。
<?php
$url="https://clutch.co/it-services";
$contents =file_get_contents($url);
$pattern = "https://clutch.co/profile/".'/^[a-zA-Z ]*$/'."#review";
$pattern = preg_quote($pattern, '/');
if(preg_match_all($pattern, $contents, $matches)){
echo "Found matches:\n";
foreach ($matches[0] as $urls) {
echo $urls;
}
}
else{
echo "No matches found";
}
?>
The regex pattern has some syntax issues:正则表达式模式有一些语法问题:
the delimiters /
need to be outside of the pattern and delimiters and special characters ( .
) inside that pattern ("https://") need to be exscaped ("https:\\/\\/")分隔符
/
需要在模式之外,并且该模式(“https://”)内的分隔符和特殊字符( .
)需要被转义(“https:\\/\\/”)
So the pattern should be:所以模式应该是:
/https:\/\/clutch\.co\/profile\/[a-zA-Z ]*#review/
A regex fiddle: https://regex101.com/r/OEUQOU/1正则表达式小提琴: https : //regex101.com/r/OEUQOU/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.