正则表达式与PHP中的内容不匹配

Question

I am trying to scrape an ebay page such as this one: http://www.ebay.co.uk/sch/Cars-/9801/i.html?_nkw=vw+golf 我正在尝试抓取这样的eBay页面： http ://www.ebay.co.uk/sch/Cars-/9801/i.html?_nkw=vw+golf

Everything works great except one of my regular expressions just isn't matching the content and therefore the matches aren't being pushed to $linksArray I have outputted the contents to make sure what I am trying to match is infact there - and it is. 一切工作都很好，除了我的一个正则表达式只是不匹配内容，因此匹配没有被推送到$linksArray我已经输出了内容，以确保我要匹配的内容在这里确实存在-确实如此。 I then go print_r($linksArray) where all the matches should be. 然后，我去所有匹配项都应该print_r($linksArray) 。 but it's not. 但事实并非如此。 It is an empty multi dimensional array. 它是一个空的多维数组。 You can see my live example here: http://www.mycommunity.co.za/marcksack/index.php 您可以在这里看到我的实时示例： http : //www.mycommunity.co.za/marcksack/index.php

Here is my PHP code: 这是我的PHP代码：

<?php
echo '<form method="POST">
<input type="text" id="url" name="url" size="120" value="' . (isset($_REQUEST["url"]) && !empty($_REQUEST["url"]) ? $_REQUEST["url"] : "") . '"/>
<input type="submit" value="Submit" />
</form>';
flush();

if (isset($_REQUEST["url"]) && !empty($_REQUEST["url"])) {
    $url = $_REQUEST["url"];
    $phones = array();
    for ($page = 1; $page <= 1; $page++) {

        // get page contents

        $contents = file_get_contents($url . "&_pgn=" . $page);
        echo(htmlentities($contents));
        // find all links patterns
        // HERE IS THE PROBLEM
        $pattern = '/class="lvtitle"><a href="(.*)" class="vip"/';
        $linksArray = array();
        preg_match_all($pattern, $contents, $linksArray);
        print_r($linksArray);
        $links = $linksArray[0];

        foreach($links as $link) {
            $pureLink = str_replace("class=\"lvtitle\"><a href=\"", "", $link);
            $pureLink = str_replace("\" class=\"vip\"", "", $pureLink);

            // getting sub page contents

            $subContents = file_get_contents($pureLink);

            // find all links patterns

            $subContents = str_replace(" ", "", $subContents);
            $phonePattern = '/07[0-9]{9}/';
            $phonesArray = array();
            preg_match_all($phonePattern, $subContents, $phonesArray);
            foreach($phonesArray[0] as $element) {

                // check if phone not added previousely to the phones array

                if (!in_array($element, $phones)) {

                    // add it to the phones array

                    array_push($phones, $element);
                    echo $element . "<br />";
                    flush();
                }
            }
        }
    }

    // print results
    foreach($phones as $phone){
        echo $phone."<br/>";
    }

}

?>

So obviously my question is what am I doing wrong? 所以很明显我的问题是我在做什么错？ Why are the matches not being pushed to my $linksArray variable. 为什么不将匹配项推送到我的$linksArray变量中。 I really appreciate your help! 非常感谢您的帮助！

Answer 1

This regex works: 此正则表达式有效：

"/ class=\"lvtitle\"><a href=\"([^\"]*)\"  class=\"vip\"/"

A few issues with your's: 您的几个问题：

You were trying to capture the URL using (.*), which will match the entire line. 您试图使用（。*）捕获URL，该URL将匹配整行。
It was not matching the entire line because ebay has two spaces in between the class and href attributes. 它与整行不匹配，因为ebay在class和href属性之间有两个空格。

Also, as has already been mentioned, you should use the API or DOMDocument for this. 另外，正如已经提到的，您应该为此使用API或DOMDocument。 But in case you are curious, this is why it wasn't working. 但是如果您好奇的话，这就是为什么它不起作用的原因。 I hope that helps! 希望对您有所帮助！

正则表达式与PHP中的内容不匹配

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-27 22:56:52

正则表达式与PHP中的内容不匹配

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-27 22:56:52

解决方案1
1 已采纳 2015-03-27 22:56:52