简体   繁体   English

Javascript RegEx 匹配 URL 但排除图像

[英]Javascript RegEx to match URL's but exclude images

I need to replace all text links in a string of HTML text by actual clickable links.我需要用实际的可点击链接替换一串 HTML 文本中的所有文本链接。 Works fine with the following RegEx:使用以下正则表达式可以正常工作:

/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gi

I then noticed it also replaces images and already formatted links.然后我注意到它还会替换图像和已经格式化的链接。 Figures I need to exclude links preceded by src" and > ... I searched a bit and read a lot on negative lookahead in many questions answered here. I tried this (added something right after the first /):我需要排除以 src" 和 > 开头的链接的数字......我在此处回答的许多问题中搜索了一些并阅读了很多关于否定前瞻的内容。我尝试了这个(在第一个 / 之后添加了一些内容):

/(^(?!src="|>)\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gi

But this doesn't match any link anymore.但这不再匹配任何链接。 I tried several similar statements, without the ^, changing some brackets, etc etc, but simply nothing seems to work.我尝试了几个类似的语句,没有 ^,更改一些括号等,但似乎没有任何效果。 I tried putting .{0} in between the part I added and \\b, to make sure he would only look at stuff right in front of the url and not consider anything farther away.我尝试将.{0}放在我添加的部分和 \\b 之间,以确保他只会查看 url 前面的内容,而不会考虑更远的内容。

EDIT: The discussion was getting long, so I decided to update the answer instead.编辑:讨论时间很长,所以我决定更新答案。

Trusting that your original regex works, I'm just going to refer to a simplified version through the rest of this answer:相信您的原始正则表达式有效,我将在本答案的其余部分中引用简化版本:

/\b(https?|ftp|file)/gi

Now, you attempted this:现在,您尝试了以下操作:

/^(?!src="|>)\b(https?|ftp|file)/gi
 ^

The main error here is marked by a caret: the caret.这里的主要错误由一个插入符号标记:插入符号。 That forces your regex to match from the beginning of the line, which is why it matched nothing.这迫使你的正则表达式从行的开头匹配,这就是为什么它什么都不匹配。 Let's remove that and move on:让我们删除它并继续:

/(?!src="|>)\b(https?|ftp|file)/gi

The main error, this time, is in your conception of lookahead assertions.这次的主要错误在于您对先行断言的概念。 As I explained in the comments, this assertion is redundant, because you are saying, "Match http or https or ftp or file , as long as none of these are src=" or > ." It's almost so redundant that the sentence doesn't even make sense to us! What you want, instead, is a lookbehind assertion:正如我在评论中所解释的,这个断言是多余的,因为你是说,“匹配httphttpsftpfile ,只要这些都不是src="> 。”这句话几乎是多余的,以至于句子没有甚至对我们来说都没有意义!相反,你想要的是一个回顾断言:

/(?<!src="|>)\b(https?|ftp|file)/gi
   ^

Why?为什么? Because you wish to find src=" or > behind the string you potentially wish to match. The problem? JavaScript doesn't support lookbehind assertions. So, I suggested an alternative. Admittedly, it was flawed (although not the cause of the HTML breaking, as you brought up). Here it is, fixed:因为您希望在您可能希望匹配的字符串后面找到src="> 。问题?JavaScript 不支持后视断言。所以,我提出了一个替代方案。诚然,它有缺陷(尽管不是 HTML 的原因)打破,正如你提出的那样)。这是固定的:

/(.[^>"]|[^=]")\b(https?|ftp|file)/gi
  ^^^^^^^^^^^^

This is indeed a non-intuitive regex, and warrants explanation.这确实是一个不直观的正则表达式,需要解释。 It splits our cases into two.它把我们的案例一分为二。 Say we have a two-character set.假设我们有一个两个字符的集。 If the set doesn't end in > or " , then we're not suspicious of it; we're good to go; match any URL that might follow. However, if it does end in > or " , well, the only "forgivable" case is where the first character is not an = .如果集合没有>"结尾,那么我们不会怀疑它;我们很高兴;匹配可能跟随的任何 URL。但是,如果它确实>"结尾,那么,唯一的“可原谅”的情况是第一个字符不是= So you see, a bit of logic trickery here.所以你看,这里有点逻辑诡计。

Now, as for why this might break your HTML.现在,至于为什么这可能会破坏您的 HTML。 Be sure to use JavaScript's replace , and substitute the first captured group back into the page!请务必使用 JavaScript 的replace ,并将第一个捕获的组替换回页面! If you simply substitute each match with nothingness, you end up "eating up" the two-character sets, which we only meant to investigate, not destroy.如果你简单地用虚无替换每个匹配,你最终会“吃掉”两个字符集,我们只是想调查,而不是破坏。

html.replace(/(.[^>"]|[^=]")\b(https?|ftp|file)/gi,
             function(match, $1, offset, original) {
                 return $1;
             });

I have to go home and haven't tested yet, but I'd feel more comfortable dealing with the easier task of isolating HTML you don't want out first.我必须回家,还没有测试过,但我觉得处理隔离 HTML 的简单任务更自在,你不想先离开。

  1. Do a match to get an array of the stuff you don't want to deal with.进行匹配以获得一系列您不想处理的东西。
  2. Rip it all out with a split.用分裂把它全部撕掉。
  3. Iterate the split array and replace URLs and then splice matched items back in迭代拆分数组并替换 URL,然后将匹配的项目拼接回
  4. Join and return加入并返回

The only assumption is that you don't end on an anchor or img tag in your text唯一的假设是您不会在文本中以锚点或 img 标签结尾

function zipperParse(htmlText,matcher){
    var zipBackInArray = htmlText.match(matcher),
    workingArray = htmlText.split(matcher),
    i = workingArray.length;

    while(i--){
        buildAnchorTagIfURLPresent(workingArray[i]); //You got this one covered
        workingArray.splice(i,0,zipBackInArray.pop());
        //working backwards makes splice much easier to use here
    }
    return workingArray.join('');    
}

var toExclude = /<a[^>]*>[^>]*>|<img[^>]*>/g;
// is supposed to match all img and anchor pairs but not handling tags inside anchors yet

zipperParse(yourHtmlText,toExclude);

this code works for me... just change the Google Api KEY to exclude..=> XXXXXXXXXXXXXXXXXXXXXX i just put it in my functions.php theme of my wordpress.这段代码对我有用……只需将 Google Api KEY 更改为 exclude..=> XXXXXXXXXXXXXXXXXXXXXXXXX 我只是将它放在我的 wordpress 的functions.php 主题中。 The first thing is to see, how your google maps code appears on your site, and then it is to match it to what is replaced.首先是查看您的谷歌地图代码如何出现在您的网站上,然后将其与被替换的内容相匹配。

function remove_script_version( $src ) { 
$parts1 = explode( '?', $src );
$parts2 = str_replace('//maps.googleapis.com/maps/api/js', '//maps.googleapis.com/maps/api/js?language=es&#038;v=3.31&#038;libraries=places&#038;key=XXXXXXXXXXXXXXXXXXXXXX&#038;ver=3.31', $parts1);
return $parts2[0]; }
add_filter( 'script_loader_src', 'remove_script_version', 15, 1 );
add_filter( 'style_loader_src', 'remove_script_version', 15, 1 );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM