使用XML正则表达式模式验证代理URL

Question

I am using the XML regex pattern to match my proxy URL. 我正在使用XML regex模式来匹配我的代理URL。

eg: Proxy : ab-proxy-sample.company.com:8080 例如：代理：ab-proxy-sample.company.com:8080

My requirement : 我的要求：

Should not begin with http:// OR https:// (Match the whole word) 不应该以http：//或https：//（匹配整个字）
Should accept any string + a port 应该接受任何字符串+端口
Should accept even strings starting with ht 甚至应接受以ht开头的字符串

My current XML regex is : [^http://|https://].+:[0-9]+| 我当前的XML正则表达式是：[^ http：// | https：//]。+：[0-9] + |

But its matching each letter instead of the whole word ? 但是它匹配每个字母而不是整个单词吗？

Any help would be highly appreciated. 任何帮助将不胜感激。 Thanks in advance ! 提前致谢！

Answer 1

As @arnep points out, you're attempting to use a negated character class with alternation , which isn't the way it works. 正如@arnep所指出的那样，您正在尝试使用带交替符号的否定字符类，但这不是它的工作方式。 Also, here is some information regarding lookaheads . 另外，这里是有关前行的一些信息。

I'm sure someone else will post an answer you can copy and paste, but this is a useful opportunity to learn the basics of regex! 我确定其他人会发布您可以复制和粘贴的答案，但这是学习正则表达式基础的有用机会！

UPDATE: 更新：

I didn't realize that you were using an engine that doesn't support negative lookarounds. 我没有意识到您使用的引擎不支持负面环视。 Without negative lookarounds, it's nearly impossible to achieve what you're trying to do. 没有负面的环顾，几乎不可能实现您想要的目标。

Nearly ;) 几乎 ;）

Here is a "brute-force" combinatoric method of doing it: 这是一种“强力”组合方法：

(?:[^h]|h(?:[^t]|t(?:[^t]|t(?:[^p]|p(?:[^s:]|s(?:[^:]|:(?:[^\/]|\/(?:[^\/])))|:(?:[^\/]|\/(?:[^\/])))))))\S+:\d+

If the XML engine doesn't support non-captured groups, ie (?: ... ) then use regular groups instead: 如果XML引擎不支持非捕获组，即(?: ... )则使用常规组：
```
 ([^h]|h([^t]|t([^t]|t([^p]|p([^s:]|s([^:]|:([^\\/]|\\/([^\\/])))|:([^\\/]|\\/([^\\/])))))))\\S+:\\d+ 
```
If the XML engine doesn't support characters classes like \\S and \\d then use [^ \\t\\r\\n\\p] and [0-9] instead. 如果XML引擎不支持\\S和\\d类的字符类，请改用[^ \\t\\r\\n\\p]和[0-9] 。

Here is a running example: http://rubular.com/r/JnpCVgeLmL . 这是一个正在运行的示例： http : //rubular.com/r/JnpCVgeLmL 。 Try changing the test string. 尝试更改测试字符串。 You'll see that... 您会看到...

    ab-proxy-sample.company.com:8080          # matches
    htab-proxy-sample.company.com:8080        # matches
    http://ab-proxy-sample.company.com:8080   # doesn't
    https://ab-proxy-sample.company.com:8080  # doesn't
    httpd://ab-proxy-sample.company.com:8080  # matches

Note that you do not need the ^ and $ . 请注意， 您不需要^和$ 。 I added these specifically for the Rubular demo, but apparently the XML engine assumes this condition (anchored-ness). 我为Rubular演示专门添加了这些，但是显然XML引擎假定了这种情况（固定）。

How does this work? 这是如何运作的？ It's easier to understand if we break it up like this: 如果我们像这样分解它，则更容易理解：

    ([^h] | h
    ([^t] | t
    ([^t] | t
    ([^p] | p
    ([^s:]| s ([^:]|:([^\/]|\/([^\/])))
          | :        ([^\/]|\/([^\/])))
    ))))
    \S+:\d+

The explanation: 说明：

If the first char isn't an "h", then great! 如果第一个字符不是“ h”，那就太好了！ (The string can't possibly be "http://" or "https://".) （字符串不能为“ http：//”或“ https：//”。）
If the first char is an "h" though, then: 如果第一个字符为 “ h”，则：
1. If the second char isn't a "t", then great! 如果第二个字符不是“ t”，那就太好了！ (The string can't possibly be "http://" or "https://".) （字符串不能为“ http：//”或“ https：//”。）
2. If the second char is a "t" though, then: 如果第二个字符是 “ t”，则：
  1. ... isn't "t", great! ...不是“ t”，太好了！
  2. ... is "t", then: ... 是 “ t”，则：
    1. ... isn't "p", great! ...不是“ p”，太好了！
    2. ... is "p", then: ... 是 “ p”，则：

Here, it gets tricky: now we encounter three branches. 在这里，这很棘手：现在我们遇到了三个分支。

If the fifth char isn't an "s" nor a ":", then great! 如果第五个字符不是“ s”也不是“：”，那就太好了！
If the fifth char is an "s" though, then: 如果第五个字符是 “ s”，则：
1. If the sixth char isn't a ":", then great! 如果第六个字符不是“：”，那就太好了！
2. If the sixth char is a ":" though, then: 如果第六个字符是 “：”，则：
  1. If the seventh char isn't a "/", then great! 如果第七个字符不是“ /”，那就太好了！
  2. If the seventh char is a "/" though, then: 如果第七个字符是 “ /”，则：
    1. If the eighth char isn't a "/", then great! 如果第八个字符不是“ /”，那就太好了！
    2. Otherwise, fail! 否则，失败！ We found an "https://". 我们找到了一个“ https：//”。
If the fifth char is a ":" though, then: 如果第五个字符是 “：”，则：
1. If the sixth char isn't a "/", then great! 如果第六个字符不是“ /”，那就太好了！
2. If the sixth char is a "/" though, then: 如果第六个字符是 “ /”，则：
  1. If the seventh char isn't a "/", then great! 如果第七个字符不是“ /”，那就太好了！
  2. Otherwise, fail! 否则，失败！ We found an "http://". 我们找到了一个“ http：//”。

And finally, if we've gotten this far, then we look for a string of non-whitespace characters, followed by a colon, followed by a string of digits. 最后，如果到此为止，我们将寻找一串非空白字符，后跟一个冒号，然后是一串数字。

I leave it to a smarter mathematician than myself to ponder whether all strings matchable using lookarounds can be "brute-forced" in such a way. 我将它留给比我本人更聪明的数学家来考虑，是否可以通过这种方式将所有使用环视条件可匹配的字符串“强加给”。

Answer 2

To avoid matching a string starting with some word, use negative look ahead: 为避免匹配以某个单词开头的字符串，请使用负向查找：

^(?!https?).*$

will match any strings that do not start with http(s). 将匹配任何不以http开头的字符串。 The other requirements are left to the reader as an exercise :-) 其他要求留给读者作为练习：-)

使用XML正则表达式模式验证代理URL

问题描述

2 个解决方案

解决方案1
1 已采纳 2012-06-22 14:34:36

解决方案2
0 2012-06-25 08:34:24

使用XML正则表达式模式验证代理URL

问题描述

2 个解决方案

解决方案1 1 已采纳 2012-06-22 14:34:36

解决方案2 0 2012-06-25 08:34:24

解决方案1
1 已采纳 2012-06-22 14:34:36

解决方案2
0 2012-06-25 08:34:24