简体   繁体   English

JavaScript REGEX:有没有一种方法可以在URL中使用斜杠字符后的斜杠来匹配,而不会出现负向后退?

[英]JavaScript REGEX: Is there a way to match slash after slash char in URL without negative lookbehind?

I'm trying to match slash that is followed by slash in url that is not part part of the protocol or query string. 我正在尝试将不包含在协议或查询字符串中的URL中的斜杠与斜杠进行匹配。

Is there any other way to do this using REGEX but without lookbehind as it is not supported in all browsers? 是否有其他使用REGEX的方法来实现此目的,但又不落后,因为并非所有浏览器都支持它?

My examples: 我的例子:

 const urls = ` https://asdf.com//asdf//asdf http://asdf.com//asdf//asdf ftp://asdf.com//asdf//asdf //asdf.com//asdf//asdf //asdf.com//asdf//asdf?test=// z39.50s://asdf// `.replace(/(?<!(^[\\w\\d-.]{2,}\\:|^|\\?.*))\\/(?=\\/)/gim, ''); console.log(urls); 

You may use 您可以使用

.replace(/^(\S*?\/\/)|(\?.*)$|(\/)+/g, '$1$2$3')

See this regex demo 观看此正则表达式演示

Details 细节

  • ^(\\S*?\\/\\/) - Group 1 (later referred to with $1 from the replacement pattern): 0 or more non-whitespace chars, as few as possible, from the start of the string, up to the first // ^(\\S*?\\/\\/) -第1组(在替换模式中后来用$1 ):从字符串开头到第一个字符,尽可能少地包含0个或多个非空格字符//
  • | - or - 要么
  • (\\?.*)$ - Group 2 ( $2 ): a ? (\\?.*)$ -组2( $2 ):a ? char and the rest of the string char和字符串的其余部分
  • | - or - 要么
  • (\\/)+ - Group 3 ( $3 ) capturing a single / char, 1 or more times (each captured / will overwrite the previous one in the group memory buffer since it is a " repeated capturing group ") (\\/)+ -组3( $3 )捕获一个/ char,1次或更多次(每个捕获的/将覆盖组内存缓冲区中的前一个,因为它是“ 重复捕获组 ”)

The normal workaround for the lookbehind deficit is to use a callback function 后向缺陷的正常解决方法是使用回调函数
in the replace part. 在替换部分。
The reason is that you have to match the errant part just to move the match position 原因是您必须匹配错误的部分才能移动匹配位置
past it. 过去了。 This requires logic in a callback function. 这需要回调函数中的逻辑。

In %99.99 of the cases, you will have to do it this way if you have different replacements. 在%99.99的情况下,如果您有其他替换产品,则必须以这种方式进行。

For the case you have it doesn't matter because you have a single replacement that is blank. 对于这种情况,没有关系,因为您有一个空白的替代品。
It gets masked by a combined group replacement where the stripping is controlled by 它被组合的组替换掩盖 ,其中剥离
not having it in a group. 不在一个小组中。

If you were to replace it with anything other than the empty string, 如果您要使用空字符串以外的任何内容替换它,
this is the only way to do it. 这是唯一的方法。

To that end, here is your (mostly) unaltered regex used with a callback. 为此,这是与回调一起使用的(通常)未更改的正则表达式。

     (                             # (1 start)
          (?: ^ [\w\d\-.]{2,} : | ^ | \? .* )
          //
     )                             # (1 end)
  |  /
     (?= / )

 var urls = [ 'https://asdf.com//asdf//asdf', 'http://asdf.com//asdf//asdf', 'ftp://asdf.com//asdf//asdf', '//asdf.com//asdf//asdf', '//asdf.com//asdf//asdf?test=//', 'z39.50s://asdf//' ]; for (var i = 0; i < urls.length; i++) { urls[i] = urls[i].replace( /((?:^[\\w\\d\\-.]{2,}:|^|\\?.*)\\/\\/)|\\/(?=\\/)/gm, function(match, Grp1) { if ( Grp1 ) return Grp1; return ''; } ); } console.log( urls ); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM