简体   繁体   English

我的PHP正则表达式有什么问题?

[英]What's wrong with my PHP regex?

I'm trying to pull a specific link from a feed where all of the content is on one line and there are multiple links present. 我正在尝试从供稿中提取特定链接,其中所有内容都在一行上,并且存在多个链接。 The one I want has the content of "[link]" in the the A tag. 我想要的一个在A标记中具有“ [link]”的内容。 Here's my example: 这是我的示例:

<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>
... could be more links before and/or after

How do I isolate just the href with the content "[link]"? 如何仅将带有内容“ [link]”的href隔离?

This regex goes to the correct end of the block I want, but starts at the first link: 此正则表达式转到我想要的块的正确末端,但从第一个链接开始:

(?<=href\=\").*?(?=\[link\])

Any help would be greatly appreciated! 任何帮助将不胜感激! Thanks. 谢谢。

Try this updated regex: 试试这个更新的正则表达式:

(?<=href\=\")[^<]*?(?=\">\[link\])

See demo . 参见演示 The problem is that the dot matches too many characters and in order to get the right 'href' you need to just restrict the regex to [^<]*? 问题是点匹配太多字符,为了获得正确的“ href”,您只需要将正则表达式限制为[^<]*? .

Alternatively :) 或者:)

This code : 此代码:

$string = '<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>';
$regex = '/href="([^"]*)">\[link\]/i';
$result = preg_match($regex, $string, $matches);
var_dump($matches);

Will return : 将返回 :

array(2) {
  [0] =>
  string(41) "href="http://www.amazingpage.com/">[link]"
  [1] =>
  string(27) "http://www.amazingpage.com/"
}

You can avoid using regular expression and use DOM to do this. 您可以避免使用正则表达式,而可以使用DOM来执行此操作。

$doc = DOMDocument::loadHTML('
     <a href="google.com/">test1</a>
     <a href="google.com/">test2</a>
     <a href="http://www.amazingpage.com/">[link]</a>
     <a href="google.com/">test3</a>
     <a href="google.com/">test4</a>
');

foreach ($doc->getElementsByTagName('a') as $link) {
   if ($link->nodeValue == '[link]') {
     echo $link->getAttribute('href');
   }
}

With DOMDocument and XPath: 使用DOMDocument和XPath:

$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);

foreach ($xpath->query('//a[. = "[link]"]/@href') as $node) {
    echo $node->nodeValue;
}

or if you are looking for only one result: 或者,如果您只寻找一个结果:

$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);

$nodeList = $xp->query('//a[. = "[link]"][1]/@href');
if ($nodeList->length) 
    echo $nodeList->item(0)->nodeValue;

xpath query details: xpath查询详细信息:

//a              # 'a' tag everywhere in the DOM tree
[. = "[link]"]   # (condition) which has "[link]" as value 
/@href           # "href" attribute

The reason your regex pattern doesn't work: 您的正则表达式模式不起作用的原因:

The regex engine walks from left to right and for each position in the string it tries to succeed. 正则表达式引擎从左向右移动,并尝试在字符串中的每个位置成功。 So, even if you use a non-greedy quantifier, you obtain always the leftmost result. 因此,即使您使用非贪婪的量词,也始终会获得最左边的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM