简体   繁体   English

正则表达式不使用OR捕获内部

[英]Regex Noncapturing inside with OR

I'm frequently new to regex so I'm having a couple issues with the regex expressions I create. 我是regex的新手,所以我创建的regex表达式遇到了一些问题。

I would like the bolded part of the expression below captured 我想捕获下面的表达式的粗体部分

src=" aifwoenflkwenflk " src =“ aifwoenflkwenflk

I have the following expression I created myself: 我创建了以下表达式:

((?<=src=)|(?<=href=))"(.*?)((?=")|(?='))"

It works but there are two problems, 它可以工作,但是有两个问题,

  1. It needs to capture only the inside of the quotes, but captures both quotes (Easy fix) 它只需要捕获引号的内部,但是捕获两个引号(轻松修复)
  2. I need it to support EITHER single or double quotes 我需要它来支持单引号或双引号

I created a new expression that is able to do exactly what I want: 我创建了一个新表达式,该表达式可以完全执行我想要的操作:

((?<=src=')|(?<=href=')|(?<=src=")|(?<=href="))(.*?)((?=")|(?='))

Though it is very long. 虽然很长。 There must be some way to optimize it so it is able to use single or double quotes, and only capture the inside. 必须有某种方法对其进行优化,以便它能够使用单引号或双引号,并且仅捕获内部。 Does anybody know how I can achieve this? 有人知道我该怎么做到吗?

I appreciate all help! 我感谢所有帮助!

As always, consider using a decent DOM parser instead, which will gently work with single quotes as well: 与往常一样,请考虑使用一个不错的DOM解析器,该解析器也可以使用单引号进行轻轻地处理:

<?php

$data = <<<DATA
<a href="some string here">some link here</a>
<img src="some so'urce here">
<a href="some other string here">some link here</a>
DATA;

$doc = new DOMDocument();
$doc->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($doc);

# links
foreach ($xpath->query("//a[@href]") as $item) {
    $source = $item->getAttribute('href');
    echo $source;
}

# images
foreach ($xpath->query("//img[@src]") as $item) {
    $source = $item->getAttribute('src');
    echo $source;
}
?>
$regex = '/(?:src|href)=["\']?([^"\'>]+)["\']?/';

演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM