Javascript RegExp返回不需要的字符

Question

I've got this string: 我有这个字符串：

<AdParameters>
    <VpaidClickThrough><![CDATA[http://media.adrcdn.com/ads/exit.html]]></VpaidClickThrough>
    <VpaidClickTracking><![CDATA[]]></VpaidClickTracking> 
    <VpaidPath><![CDATA[http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2F]]></VpaidPath> 
    <VpaidDuration><![CDATA[]]></VpaidDuration>
    <VpaidId><![CDATA[e322f52bc813f05beacb6fe522a52f20]]></VpaidId>
</AdParameters>
<MediaFiles>
    <MediaFile id="0" maintainAspectRatio="false" scalable="false" delivery="progressive"  width="640" height="360" apiFramework='VPAID' type="application/x-shockwave-flash">  <![CDATA[http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2Fmediafile_lineair_640x360.swf?VpaidId=e322f52bc813f05beacb6fe522a52f20&VpaidPath=http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2F]]></MediaFile>
<MediaFiles>

And I want to extract from here all the ENCODED URLs. 我想从这里提取所有ENCODED URL。 So I'm using this RegExp: 所以我正在使用这个RegExp：

(http\%3A.*)\?|(http\%3A.*)\]\]

But what I get is this: 但我得到的是：

http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2F]]
http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2Fmediafile_lineair_640x360.swf?
http%3A%2F%2Fmedia.adrcdn.com%2Fads%2FAdrime%2F3130343734%2F61112%2F]]

It's quite ok but I don't want the final "]]" and "?" 这很好，但我不想要最后的“]]和”？“ How do I get the URLs without those ending characters? 如何获取没有这些结束字符的URL？

It's strange because trying my regex here http://regex101.com/r/zS0tZ8 it looks to work perfectly. 这很奇怪，因为在这里尝试我的正则表达式http://regex101.com/r/zS0tZ8它看起来完美无缺。

Thank you in advance. 先感谢您。

Answer 1

In regex101 I believe you are considering the captured group, but that's not all the regex returns: the match itself will be what's matched by the whole regex, not only what's inside parenthesis. 在regex101中，我相信你正在考虑被捕获的群体，但这并不是所有的正则表达式返回：匹配本身将是整个正则表达式所匹配的，而不仅仅是括号内的内容。

This basically means you've got to ways of solving your issue: 这基本上意味着您必须解决问题：

return the first captured group . 返回第一个捕获的组 。 Your regex does the job alright, you just need to return the correct captured value. 您的正则表达式可以正常工作，您只需返回正确的捕获值即可。 (BTW, no need to escape ]] . You can factorize it with (http%3A.*?)(?:\\?|]]) , the (?: ) being a non-capturing group) （BTW，无需逃避]] 。您可以使用(http%3A.*?)(?:\\?|]])对其进行分解， (?: ) ：）是非捕获组）
edit your regex so that the end delimiter isn't part of the match . 编辑你的正则表达式，以便结束分隔符不是匹配的一部分 。 Something with look ahead could work, like http%3A.*?(?=\\?|]]) (notice there's no need for parenthesis anymore), but you could probably achieve the same thing with: 看起来像前面的东西可以工作，比如http%3A.*?(?=\\?|]]) （注意不再需要括号），但你可以用以下方法实现同样的目的：
```
 http%3A[^]?]* 
```
The [^ ] meaning "anything but what's inside the brackets". [^ ]意思是“括号内的东西”。

Answer 2

There are a number of solutions to this, but this is what I prefer: 有很多解决方案，但这是我更喜欢的：

http%3A[\w%.]*

This just matches what's in a valid encoded URL, without worrying about what comes afterward. 这只是匹配有效编码URL中的内容，而不用担心之后会发生什么。

Answer 3

http%3A.*?(?=\?|]])

should do the job 应该做的工作

EDIT: little explanation: 编辑：小解释：

(?=regex)

...tests the regex without adding the results to the match. ...测试正则表达式而不将结果添加到匹配项中。 It's called "positive lookahead". 它被称为“积极前瞻”。

Answer 4

I'm not sure how you used your RegExp, but this should work: 我不确定你是如何使用你的RegExp的，但这应该有效：

function extractEncodedURLs(str) {
  var pattern = /(http%3A.*?)(\?|]])/g;

  var results = [];
  var match;
  while (match = pattern.exec(str)) {
    results.push(match[1]);
  }
  return results;
}

Javascript RegExp返回不需要的字符

问题描述

4 个解决方案

解决方案1
2 已采纳 2014-02-13 12:42:06

解决方案2
1 2014-02-13 12:43:08

解决方案3
0 2014-02-13 12:32:42

解决方案4
0 2014-02-13 12:42:24

Javascript RegExp返回不需要的字符

问题描述

4 个解决方案

解决方案1 2 已采纳 2014-02-13 12:42:06

解决方案2 1 2014-02-13 12:43:08

解决方案3 0 2014-02-13 12:32:42

解决方案4 0 2014-02-13 12:42:24

解决方案1
2 已采纳 2014-02-13 12:42:06

解决方案2
1 2014-02-13 12:43:08

解决方案3
0 2014-02-13 12:32:42

解决方案4
0 2014-02-13 12:42:24