简体   繁体   English

PHP Regex批处理更新

[英]PHP Regex batch update

In short, I want to talk about my problem; 简而言之,我想谈谈我的问题;

$text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';
$text = preg_replace('#(?<!((alt|src)="))Lorem(?!(.*("|<\/a>)))#i', '<a href="Lorem" title="Lorem" style="color: inherit;">\0</a>', $text);
$text = preg_replace('#(?<!((alt|src)="))Ipsum(?!(.*("|<\/a>)))#i', '<a href="Ipsum" title="Ipsum" style="color: inherit;">\0</a>', $text);
echo $text;

" Lorem " changes, but " Ipsum " does not change. Lorem ”更改,但“ Ipsum ”不变。

The result of php above: 上面的php结果:

 <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum is simply dummy text of the printing and typesetting industry. <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of <a href="Lorem" title="Lorem" style="color: inherit;">Lorem</a> <a href="Ipsum" title="Ipsum" style="color: inherit;">Ipsum</a>. 

Why doesn't " Ipsum " change? 为什么“ Ipsum ”没有变化?

Edited: 编辑:

If you comment out the first preg_replace line the - used to be - second preg_replace will work just fine. 如果您注释掉第一行preg_replace 以前是 -第二行preg_replace将可以正常工作。 PHP Fiddle 1 hit F9 to run PHP Fiddle 1 点击F9运行

Also if you swap the the places of the two preg_replace 's you'll get " Ipsum " replaced but not " Lorem " PHP Fiddle 2 另外,如果交换两个preg_replace的位置,则将替换“ Ipsum ”,而不替换“ LoremPHP Fiddle 2

So, if the two words are not initially in anchor tags <a> , you don't need to have the lookbehind and lookahead conditions, or at least, not in the second preg_replace , otherwise the two lookaround conditions will be true PHP Fiddle 3 ( 1 ) 因此, 如果这两个词最初不在锚标记 <a> ,则无需具有lookbehind和lookahead条件,或者至少不需要在第二个preg_replace ,否则,两个环顾条件将是真实的PHP Fiddle 3 1


Update: 更新:

As mentioned in a comment by the OP, when using the above there will be a problem if string $text has <a> tags with same criteria words, something like : 正如OP的评论中所提到的,当使用上述字符串时,如果字符串$text具有带有相同条件词的<a>标签,例如:

 <a href="">test Lorem test</a>

In this case, using REGEX alone won't do it IMHO, Instead we need to do the following: 在这种情况下,仅使用REGEX不能做到恕我直言,相反,我们需要执行以下操作:

  1. Check for any occurrence of anchor tags <a> in the string $text . 检查字符串$text是否出现锚标记<a>
  2. Use an array $tempArr , as a temporary storage to store link elements. 使用数组$tempArr作为临时存储来存储链接元素。
  3. Replace every link element with some text that has a different form, with number as an unique ID, final result: tempRep#0 , tempRep#1 .. etc, one for and in place of every link element. 将每个链接元素替换为具有不同格式的某些文本,并以数字作为唯一ID,最终结果为: tempRep#0tempRep#1 ..等,每个链接元素tempRep#1代替。
  4. Run the REGEX statement(s) ( 2 ) . 运行REGEX语句2
  5. Now we reverse the process in step #3, we replace tempRep#0 , tempRep#1 .. etc, with their corresponding link elements which we have temporarily stored as array elements in the $tempArr , matching the number in each unique ID with the same array index number ( 3 ) . 现在,我们在步骤#3中进行反向操作,将tempRep#0tempRep#1等替换为它们对应的链接元素,这些链接元素已作为数组元素临时存储在$tempArr ,并将每个唯一ID中的数字与相同的数组索引号3

The above algorithm could be achieved with JavaScript because we need some Document Object Model checking, but as the OP said, Javascript is not an option, so we need to make use of the PHP Document Object Model by loading the string $text as HTML, and make use of these PHP DOM commands: getElementsByTagName() , getAttribute() and textContent ( or alternatively, nodeValue ). 上面的算法可以用JavaScript来实现,因为我们需要进行一些文档对象模型检查,但是正如OP所说,JavaScript不是一种选择,因此我们需要通过将字符串$text加载为HTML来利用PHP Document Object Model ,并使用以下PHP DOM命令: getElementsByTagName()getAttribute()textContent (或nodeValue )。

So finally, we have the following: 最后,我们有以下内容:

PHP Fiddle 4 [ Final ] PHP小提琴4 [最终版]

$text = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of <a href="link1href" title="test1">test Ipsum Lorem test</a> Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of <a href="link2href" title="test2">test Lorem test</a> Lorem Ipsum.';

$dom = new DOMDocument;
$dom->loadHTML($text);
$tempArr = array();
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {  
    $href = $link->getAttribute('href');
    $title = $link->getAttribute('title');
    $textCont = $link->textContent; //Alternatively, $link->->nodeValue could be used too
    $linkElement = '<a href="' . $href . '" title="' . $title . '">' . $textCont . '</a>';
    $tempArr[] = $linkElement;
}

for($i=0; $i < count($tempArr); $i++){
    $text = str_replace($tempArr[$i], 'tempRep#' . $i, $text);
}

$text = preg_replace('#(?<!(alt|src)=")(Lorem|Ipsum)(?!(("|<\/a>)))#i', '<a href="\0" title="\0" style="color: inherit;">\0</a>', $text);

for($i=0; $i < count($tempArr); $i++){
    $text = str_replace('tempRep#' . $i, $tempArr[$i], $text);
}
echo $text;

----------------------------- -----------------------------

Notes: 笔记:

  1. I have found out that it's only the lookahead condition in the second preg_replace function is what causing the bug, in this PHP Fiddle 5 , I kept the lookbehind and only removed the lookahead and oddly it is still working fine. 我发现,第二个preg_replace函数中的前瞻性条件是导致该错误的原因,在此PHP Fiddle 5中 ,我保留了后视性,只删除了前瞻性,奇怪的是它仍然可以正常工作。
  2. I've merged the 2 REGEX statements into one: 我已经将2个REGEX语句合并为一个:

     $text = preg_replace('#(?<!(alt|src)=")(Lorem|Ipsum)(?!(("|<\\/a>)))#i', '<a href="\\0" title="\\0" style="color: inherit;">\\0</a>', $text); 
  3. this why we used a unique ID for each replacement. 这就是为什么我们为每个替换使用唯一的ID。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM