PHP preg_match_all不正确匹配

Question

我正在尝试从网站源代码中获取一些数据。 我想做的是在/collections/(whatever that follows here)后面的所有内容）之后获取所有内容。 我的模式与我要寻找的“大多数”匹配。 当我的preg_match_all到达带有“＆”的模式时，就会出现问题，此时它将简单地读取到“＆”的点，而不再读取其余部分。 这是我的脚本：

$homepage = file_get_contents('http://www.harrisfarm.com.au/');
$pattern = '/collections([\w-&\/]*)/i';
preg_match_all($pattern, $processedHomePage, $collections);
print_r($collections);

请注意，当这样打印时，“＆”后的内容将被忽略，这将使我得到：

/collections/seafood/Shellfish-&

但是当我在一个字符串上进行模式匹配时，如下所示：

 $subject = 'a href="/collections/organic/Pantry/sickmonster/grandma"  <a href="/collections/seafood/Shellfish-&-Crustaceans">Oysters, Shellfish & Crustaceans';

它为我提供了我想要的一切：

/collections/seafood/Shellfish-&-Crustaceans

所以我想知道...为什么会这样？ 我真的很为难。

Answer 1

当您使用$ homepage代替preg_match_all中的$ processedHomePage时，所提供的代码没有问题。

顺便说一句：您应将方括号中的减号转义（或在方括号中的表达式的开头或结尾处写上减号），但令人惊讶的是，这在您的情况下没有区别：

$ pattern ='/ collections（[-\\ w＆/] *）/ i';

有关更多信息，请参见http://php.net/manual/regexp.reference.meta.php 。

Answer 2

我弄清楚了问题所在-也许这以后会帮助其他人。

我曾尝试使用htmlspecialchars()转换URL http://www.harrisfarm.com.au/ ，然后将其作为字符串读取。 这将一些特殊字符（如&以及其他一些东西）转换为具有许多字符的东西。

&的转换将其转换为& 有一个; ，而这不在我的正则表达式中。 因为; 不是正则表达式的一部分，正则表达式在该点停止匹配。

Answer 3

尝试这个：

$re = "/\\/collections([\\w\\-\\&\\/;]*)/mi";
$str = "<a href=\"/collections/seafood/Shellfish-&amp;-Crustaceans\">Oysters, Shellfish & Crustaceans';\n<a href=\"/collections/seafood/Shellfish-&-Crustaceans\">Oysters,collections Shellfish & Crustaceans';";

preg_match_all($re, $str, $matches);

现场演示

您的更新代码

$homepage = file_get_contents('http://www.harrisfarm.com.au/');
$pattern = "/\\/collections([\\w\\-\\&\\/;]*)/mi";
preg_match_all($pattern, $homepage, $collections);
print_r($collections);

PHP preg_match_all不正确匹配

问题描述

3 个解决方案

解决方案1
0 2014-11-24 22:22:02

解决方案2
0 2014-11-24 22:57:50

解决方案3
0

PHP preg_match_all不正确匹配

问题描述

3 个解决方案

解决方案1 0 2014-11-24 22:22:02

解决方案2 0 2014-11-24 22:57:50

解决方案3 0

解决方案1
0 2014-11-24 22:22:02

解决方案2
0 2014-11-24 22:57:50

解决方案3
0