[英]Regular Expression not matching content in PHP
I am trying to scrape an ebay page such as this one: http://www.ebay.co.uk/sch/Cars-/9801/i.html?_nkw=vw+golf 我正在尝试抓取这样的eBay页面: http ://www.ebay.co.uk/sch/Cars-/9801/i.html?_nkw=vw+golf
Everything works great except one of my regular expressions just isn't matching the content and therefore the matches aren't being pushed to $linksArray
I have outputted the contents to make sure what I am trying to match is infact there - and it is. 一切工作都很好,除了我的一个正则表达式只是不匹配内容,因此匹配没有被推送到$linksArray
我已经输出了内容,以确保我要匹配的内容在这里确实存在-确实如此。 I then go print_r($linksArray)
where all the matches should be. 然后,我去所有匹配项都应该print_r($linksArray)
。 but it's not. 但事实并非如此。 It is an empty multi dimensional array. 它是一个空的多维数组。 You can see my live example here: http://www.mycommunity.co.za/marcksack/index.php 您可以在这里看到我的实时示例: http : //www.mycommunity.co.za/marcksack/index.php
Here is my PHP code: 这是我的PHP代码:
<?php
echo '<form method="POST">
<input type="text" id="url" name="url" size="120" value="' . (isset($_REQUEST["url"]) && !empty($_REQUEST["url"]) ? $_REQUEST["url"] : "") . '"/>
<input type="submit" value="Submit" />
</form>';
flush();
if (isset($_REQUEST["url"]) && !empty($_REQUEST["url"])) {
$url = $_REQUEST["url"];
$phones = array();
for ($page = 1; $page <= 1; $page++) {
// get page contents
$contents = file_get_contents($url . "&_pgn=" . $page);
echo(htmlentities($contents));
// find all links patterns
// HERE IS THE PROBLEM
$pattern = '/class="lvtitle"><a href="(.*)" class="vip"/';
$linksArray = array();
preg_match_all($pattern, $contents, $linksArray);
print_r($linksArray);
$links = $linksArray[0];
foreach($links as $link) {
$pureLink = str_replace("class=\"lvtitle\"><a href=\"", "", $link);
$pureLink = str_replace("\" class=\"vip\"", "", $pureLink);
// getting sub page contents
$subContents = file_get_contents($pureLink);
// find all links patterns
$subContents = str_replace(" ", "", $subContents);
$phonePattern = '/07[0-9]{9}/';
$phonesArray = array();
preg_match_all($phonePattern, $subContents, $phonesArray);
foreach($phonesArray[0] as $element) {
// check if phone not added previousely to the phones array
if (!in_array($element, $phones)) {
// add it to the phones array
array_push($phones, $element);
echo $element . "<br />";
flush();
}
}
}
}
// print results
foreach($phones as $phone){
echo $phone."<br/>";
}
}
?>
So obviously my question is what am I doing wrong? 所以很明显我的问题是我在做什么错? Why are the matches not being pushed to my $linksArray
variable. 为什么不将匹配项推送到我的$linksArray
变量中。 I really appreciate your help! 非常感谢您的帮助!
This regex works: 此正则表达式有效:
"/ class=\"lvtitle\"><a href=\"([^\"]*)\" class=\"vip\"/"
A few issues with your's: 您的几个问题:
Also, as has already been mentioned, you should use the API or DOMDocument for this. 另外,正如已经提到的,您应该为此使用API或DOMDocument。 But in case you are curious, this is why it wasn't working. 但是如果您好奇的话,这就是为什么它不起作用的原因。 I hope that helps! 希望对您有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.