我的PHP正则表达式有什么问题？

Question

I'm trying to pull a specific link from a feed where all of the content is on one line and there are multiple links present. 我正在尝试从供稿中提取特定链接，其中所有内容都在一行上，并且存在多个链接。 The one I want has the content of "[link]" in the the A tag. 我想要的一个在A标记中具有“ [link]”的内容。 Here's my example: 这是我的示例：

<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>
... could be more links before and/or after

How do I isolate just the href with the content "[link]"? 如何仅将带有内容“ [link]”的href隔离？

This regex goes to the correct end of the block I want, but starts at the first link: 此正则表达式转到我想要的块的正确末端，但从第一个链接开始：

(?<=href\=\").*?(?=\[link\])

Any help would be greatly appreciated! 任何帮助将不胜感激！ Thanks. 谢谢。

Answer 1

Try this updated regex: 试试这个更新的正则表达式：

(?<=href\=\")[^<]*?(?=\">\[link\])

See demo . 参见演示。 The problem is that the dot matches too many characters and in order to get the right 'href' you need to just restrict the regex to [^<]*? 问题是点匹配太多字符，为了获得正确的“ href”，您只需要将正则表达式限制为[^<]*? . 。

Answer 2

Alternatively :) 或者:)

This code : 此代码：

$string = '<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>';
$regex = '/href="([^"]*)">\[link\]/i';
$result = preg_match($regex, $string, $matches);
var_dump($matches);

Will return : 将返回：

array(2) {
  [0] =>
  string(41) "href="http://www.amazingpage.com/">[link]"
  [1] =>
  string(27) "http://www.amazingpage.com/"
}

Answer 3

You can avoid using regular expression and use DOM to do this. 您可以避免使用正则表达式，而可以使用DOM来执行此操作。

$doc = DOMDocument::loadHTML('
     <a href="google.com/">test1</a>
     <a href="google.com/">test2</a>
     <a href="http://www.amazingpage.com/">[link]</a>
     <a href="google.com/">test3</a>
     <a href="google.com/">test4</a>
');

foreach ($doc->getElementsByTagName('a') as $link) {
   if ($link->nodeValue == '[link]') {
     echo $link->getAttribute('href');
   }
}

Answer 4

With DOMDocument and XPath: 使用DOMDocument和XPath：

$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);

foreach ($xpath->query('//a[. = "[link]"]/@href') as $node) {
    echo $node->nodeValue;
}

or if you are looking for only one result: 或者，如果您只寻找一个结果：

$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);

$nodeList = $xp->query('//a[. = "[link]"][1]/@href');
if ($nodeList->length) 
    echo $nodeList->item(0)->nodeValue;

xpath query details: xpath查询详细信息：

//a              # 'a' tag everywhere in the DOM tree
[. = "[link]"]   # (condition) which has "[link]" as value 
/@href           # "href" attribute

The reason your regex pattern doesn't work: 您的正则表达式模式不起作用的原因：

The regex engine walks from left to right and for each position in the string it tries to succeed. 正则表达式引擎从左向右移动，并尝试在字符串中的每个位置成功。 So, even if you use a non-greedy quantifier, you obtain always the leftmost result. 因此，即使您使用非贪婪的量词，也始终会获得最左边的结果。

我的PHP正则表达式有什么问题？

问题描述

4 个解决方案

解决方案1
3 已采纳 2015-03-01 23:50:20

解决方案2
2 2015-03-01 23:55:32

解决方案3
1 2015-03-02 00:00:04

解决方案4
1 2015-03-02 00:12:27

我的PHP正则表达式有什么问题？

问题描述

4 个解决方案

解决方案1 3 已采纳 2015-03-01 23:50:20

解决方案2 2 2015-03-01 23:55:32

解决方案3 1 2015-03-02 00:00:04

解决方案4 1 2015-03-02 00:12:27

解决方案1
3 已采纳 2015-03-01 23:50:20

解决方案2
2 2015-03-01 23:55:32

解决方案3
1 2015-03-02 00:00:04

解决方案4
1 2015-03-02 00:12:27