[英]Validate proxy URL using XML regex pattern
I am using the XML regex pattern to match my proxy URL. 我正在使用XML regex模式来匹配我的代理URL。
eg: Proxy : ab-proxy-sample.company.com:8080 例如:代理:ab-proxy-sample.company.com:8080
My requirement : 我的要求:
My current XML regex is : [^http://|https://].+:[0-9]+| 我当前的XML正则表达式是:[^ http:// | https://]。+:[0-9] + |
But its matching each letter instead of the whole word ? 但是它匹配每个字母而不是整个单词吗?
Any help would be highly appreciated. 任何帮助将不胜感激。 Thanks in advance !
提前致谢 !
As @arnep points out, you're attempting to use a negated character class with alternation , which isn't the way it works. 正如@arnep所指出的那样,您正在尝试使用带交替符号的否定字符类 ,但这不是它的工作方式。 Also, here is some information regarding lookaheads .
另外,这里是有关前行的一些信息。
I'm sure someone else will post an answer you can copy and paste, but this is a useful opportunity to learn the basics of regex! 我确定其他人会发布您可以复制和粘贴的答案,但这是学习正则表达式基础的有用机会!
UPDATE: 更新:
I didn't realize that you were using an engine that doesn't support negative lookarounds. 我没有意识到您使用的引擎不支持负面环视。 Without negative lookarounds, it's nearly impossible to achieve what you're trying to do.
没有负面的环顾,几乎不可能实现您想要的目标。
Nearly ;) 几乎 ;)
Here is a "brute-force" combinatoric method of doing it: 这是一种“强力”组合方法:
(?:[^h]|h(?:[^t]|t(?:[^t]|t(?:[^p]|p(?:[^s:]|s(?:[^:]|:(?:[^\/]|\/(?:[^\/])))|:(?:[^\/]|\/(?:[^\/])))))))\S+:\d+
If the XML engine doesn't support non-captured groups, ie (?: ... )
then use regular groups instead: 如果XML引擎不支持非捕获组,即
(?: ... )
则使用常规组:
([^h]|h([^t]|t([^t]|t([^p]|p([^s:]|s([^:]|:([^\\/]|\\/([^\\/])))|:([^\\/]|\\/([^\\/])))))))\\S+:\\d+
If the XML engine doesn't support characters classes like \\S
and \\d
then use [^ \\t\\r\\n\\p]
and [0-9]
instead. 如果XML引擎不支持
\\S
和\\d
类的字符类,请改用[^ \\t\\r\\n\\p]
和[0-9]
。
Here is a running example: http://rubular.com/r/JnpCVgeLmL . 这是一个正在运行的示例: http : //rubular.com/r/JnpCVgeLmL 。 Try changing the test string.
尝试更改测试字符串。 You'll see that...
您会看到...
ab-proxy-sample.company.com:8080 # matches
htab-proxy-sample.company.com:8080 # matches
http://ab-proxy-sample.company.com:8080 # doesn't
https://ab-proxy-sample.company.com:8080 # doesn't
httpd://ab-proxy-sample.company.com:8080 # matches
Note that you do not need the ^
and $
. 请注意, 您不需要
^
和$
。 I added these specifically for the Rubular demo, but apparently the XML engine assumes this condition (anchored-ness). 我为Rubular演示专门添加了这些,但是显然XML引擎假定了这种情况(固定)。
How does this work? 这是如何运作的? It's easier to understand if we break it up like this:
如果我们像这样分解它,则更容易理解:
([^h] | h
([^t] | t
([^t] | t
([^p] | p
([^s:]| s ([^:]|:([^\/]|\/([^\/])))
| : ([^\/]|\/([^\/])))
))))
\S+:\d+
The explanation: 说明:
Here, it gets tricky: now we encounter three branches. 在这里,这很棘手:现在我们遇到了三个分支。
And finally, if we've gotten this far, then we look for a string of non-whitespace characters, followed by a colon, followed by a string of digits. 最后,如果到此为止,我们将寻找一串非空白字符,后跟一个冒号,然后是一串数字。
I leave it to a smarter mathematician than myself to ponder whether all strings matchable using lookarounds can be "brute-forced" in such a way. 我将它留给比我本人更聪明的数学家来考虑,是否可以通过这种方式将所有使用环视条件可匹配的字符串“强加给”。
To avoid matching a string starting with some word, use negative look ahead: 为避免匹配以某个单词开头的字符串,请使用负向查找:
^(?!https?).*$
will match any strings that do not start with http(s). 将匹配任何不以http开头的字符串。 The other requirements are left to the reader as an exercise :-)
其他要求留给读者作为练习:-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.