简体   繁体   English

在preg_replace的链接中添加斜杠

[英]add trailing slash to links with preg_replace

Have some internal links in my site content that do not have a trailing "/" and it is causing some crawling issues for me. 我的网站内容中有一些内部链接,但没有尾部的“ /”,这对我来说造成了一些爬网问题。 Want to do a search and replace for these links. 想要搜索并替换这些链接。 So https://www.example.com/slug should become https://www.example.com/slug/ . 因此, https://www.example.com/slug应该成为https://www.example.com/slug/ I am using the following function to push the entire content of a page through and replace all necessary links on the page: 我正在使用以下功能来推送页面的整个内容并替换页面上的所有必要链接:

function str_replace_links($subject, &$count) {
    //match the first part of the link http://www.example.com{/slug}
    $regex = '/(https:\/\/www.example.com)(\/[a-zA-Z_0-9\-]*)*';
    //check for the trailing '/' or if it is a file
    $regex .= '([^(\/|\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.torrent|\.ttf|\.woff|\.svg|\.eot|\.woff2)])';
    //finish ooff regex
    $regex .= '/i';
    $i; // counter for # changed
    $content = preg_replace($regex, '$1$2/', $subject, 1, $i);
    $count += $i;
    return $content;
}

I have tried test with a string a few links: 我尝试用一​​些链接的字符串进行测试:

$string ='
<a href="https://www.example.com/slug1/page">1</a><br/>
<a href="https://www.example.com/slug2/page">2</a><br/>
<a href="https://www.example.com/slug1/page/">3</a><br/>
<a href="https://www.example.com/slug2/page/">4</a><br/>
<a href="https://www.example.com/">5</a><br/>
<a href="https://www.example.com">5b</a><br/>
<a href="https://www.example.com/style.css">6</a><br/>
<a href="https://www.example.com/style.jpg">7</a><br/>
<a href="https://www.example.com/style.png">8</a><br/>
<a href="https://www.example.com/style.pdf">9</a><br/>
';

echo str_replace_links($string, $switch);

However, this doesn't result in proper results: 但是,这不会导致正确的结果:

<a href="https://www.example.com/page/>1</a><br/>
<a href="https://www.example.com/page/>2</a><br/>
<a href="https://www.example.com//>3</a><br/>
<a href="https://www.example.com//>4</a><br/>
<a href="https://www.example.com//>5</a><br/>
<a href="https://www.example.com/>5b</a><br/>
<a href="https://www.example.com/st/le.css">6</a><br/>
<a href="https://www.example.com/st/le.jpg">7</a><br/>
<a href="https://www.example.com/st/le.png">8</a><br/>
<a href="https://www.example.com/st/le.pdf">9</a><br/>

Any help with the regex would be appreciated. 与正则表达式的任何帮助将不胜感激。

You can use a tweaked URL validator to do it. 您可以使用经过调整的URL验证程序来执行此操作。

~(?i)(?<=")((?!mailto:)(?:[az]*:\\/\\/)?(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[az\\x{a1}-\\x{ffff}0-9]+-?)*[az\\x{a1}-\\x{ffff}0-9]+)(?:\\.(?:[az\\x{a1}-\\x{ffff}0-9]+-?)*[az\\x{a1}-\\x{ffff}0-9]+)*(?:\\.(?:[az\\x{a1}-\\x{ffff}]{2,})))|localhost)(:\\d{2,5})?(?:\\/(?:[^\\s/]*/)*[^\\s/.]+)?)(?=")~

https://regex101.com/r/GcT8ZU/1 https://regex101.com/r/GcT8ZU/1

Formatted 格式化的

 (?i)

 (?<= " )
 (                             # (1 start)
      (?! mailto: )
      (?: [a-z]* :\/\/ )?
      (?:
           \S+ 
           (?: : \S* )?
           @
      )?
      (?:
           (?:
                (?:
                     [1-9] \d? 
                  |  1 \d\d 
                  |  2 [01] \d 
                  |  22 [0-3] 
                )
                (?:
                     \.
                     (?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
                ){2}
                (?:
                     \.
                     (?:
                          [1-9] \d? 
                       |  1 \d\d 
                       |  2 [0-4] \d 
                       |  25 [0-4] 
                     )
                )
             |  (?:
                     (?: [a-z\x{a1}-\x{ffff}0-9]+ -? )*
                     [a-z\x{a1}-\x{ffff}0-9]+ 
                )
                (?:
                     \.
                     (?: [a-z\x{a1}-\x{ffff}0-9]+ -? )*
                     [a-z\x{a1}-\x{ffff}0-9]+ 
                )*
                (?:
                     \.
                     (?: [a-z\x{a1}-\x{ffff}]{2,} )
                )
           )
        |  localhost
      )
      ( : \d{2,5} )?                # (2)
      (?:
           \/
           (?: [^\s/]* / )*
           [^\s/.]+ 
      )?
 )                             # (1 end)
 (?= " )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM