繁体   English   中英

PHP Regex批处理更新

[英]PHP Regex batch update

简而言之,我想谈谈我的问题;

$text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';
$text = preg_replace('#(?<!((alt|src)="))Lorem(?!(.*("|<\/a>)))#i', '<a href="Lorem" title="Lorem" style="color: inherit;">\0</a>', $text);
$text = preg_replace('#(?<!((alt|src)="))Ipsum(?!(.*("|<\/a>)))#i', '<a href="Ipsum" title="Ipsum" style="color: inherit;">\0</a>', $text);
echo $text;

Lorem ”更改,但“ Ipsum ”不变。

上面的php结果:

 <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum is simply dummy text of the printing and typesetting industry. <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> <a href="Ipsum" title="Ipsum" style="color: inherit;">Ipsum</a>. 

为什么“ Ipsum ”没有变化?

编辑:

如果您注释掉第一行preg_replace 以前是 -第二行preg_replace将可以正常工作。 PHP Fiddle 1 点击F9运行

另外,如果交换两个preg_replace的位置,则将替换“ Ipsum ”,而不替换“ LoremPHP Fiddle 2

因此, 如果这两个词最初不在锚标记 <a> ,则无需具有lookbehind和lookahead条件,或者至少不需要在第二个preg_replace ,否则,两个环顾条件将是真实的PHP Fiddle 3 1


更新:

正如OP的评论中所提到的,当使用上述字符串时,如果字符串$text具有带有相同条件词的<a>标签,例如:

 <a href="">test Lorem test</a>

在这种情况下,仅使用REGEX不能做到恕我直言,相反,我们需要执行以下操作:

  1. 检查字符串$text是否出现锚标记<a>
  2. 使用数组$tempArr作为临时存储来存储链接元素。
  3. 将每个链接元素替换为具有不同格式的某些文本,并以数字作为唯一ID,最终结果为: tempRep#0tempRep#1 ..等,每个链接元素tempRep#1代替。
  4. 运行REGEX语句2
  5. 现在,我们在步骤#3中进行反向操作,将tempRep#0tempRep#1等替换为它们对应的链接元素,这些链接元素已作为数组元素临时存储在$tempArr ,并将每个唯一ID中的数字与相同的数组索引号3

上面的算法可以用JavaScript来实现,因为我们需要进行一些文档对象模型检查,但是正如OP所说,JavaScript不是一种选择,因此我们需要通过将字符串$text加载为HTML来利用PHP Document Object Model ,并使用以下PHP DOM命令: getElementsByTagName()getAttribute()textContent (或nodeValue )。

最后,我们有以下内容:

PHP小提琴4 [最终版]

$text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of <a href="link1href" title="test1">test Ipsum Lorem test</a> Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of <a href="link2href" title="test2">test Lorem test</a> Lorem Ipsum.';

$dom = new DOMDocument;
$dom->loadHTML($text);
$tempArr = array();
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {  
    $href = $link->getAttribute('href');
    $title = $link->getAttribute('title');
    $textCont = $link->textContent; //Alternatively, $link->->nodeValue could be used too
    $linkElement = '<a href="' . $href . '" title="' . $title . '">' . $textCont . '</a>';
    $tempArr[] = $linkElement;
}

for($i=0; $i < count($tempArr); $i++){
    $text = str_replace($tempArr[$i], 'tempRep#' . $i, $text);
}

$text = preg_replace('#(?<!(alt|src)=")(Lorem|Ipsum)(?!(("|<\/a>)))#i', '<a href="\0" title="\0" style="color: inherit;">\0</a>', $text);

for($i=0; $i < count($tempArr); $i++){
    $text = str_replace('tempRep#' . $i, $tempArr[$i], $text);
}
echo $text;

-----------------------------

笔记:

  1. 我发现,第二个preg_replace函数中的前瞻性条件是导致该错误的原因,在此PHP Fiddle 5中 ,我保留了后视性,只删除了前瞻性,奇怪的是它仍然可以正常工作。
  2. 我已经将2个REGEX语句合并为一个:

     $text = preg_replace('#(?<!(alt|src)=")(Lorem|Ipsum)(?!(("|<\\/a>)))#i', '<a href="\\0" title="\\0" style="color: inherit;">\\0</a>', $text); 
  3. 这就是为什么我们为每个替换使用唯一的ID。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM