I am using the XML regex pattern to match my proxy URL.
eg: Proxy : ab-proxy-sample.company.com:8080
My requirement :
My current XML regex is : [^http://|https://].+:[0-9]+|
But its matching each letter instead of the whole word ?
Any help would be highly appreciated. Thanks in advance !
As @arnep points out, you're attempting to use a negated character class with alternation , which isn't the way it works. Also, here is some information regarding lookaheads .
I'm sure someone else will post an answer you can copy and paste, but this is a useful opportunity to learn the basics of regex!
UPDATE:
I didn't realize that you were using an engine that doesn't support negative lookarounds. Without negative lookarounds, it's nearly impossible to achieve what you're trying to do.
Nearly ;)
Here is a "brute-force" combinatoric method of doing it:
(?:[^h]|h(?:[^t]|t(?:[^t]|t(?:[^p]|p(?:[^s:]|s(?:[^:]|:(?:[^\/]|\/(?:[^\/])))|:(?:[^\/]|\/(?:[^\/])))))))\S+:\d+
If the XML engine doesn't support non-captured groups, ie (?: ... )
then use regular groups instead:
([^h]|h([^t]|t([^t]|t([^p]|p([^s:]|s([^:]|:([^\\/]|\\/([^\\/])))|:([^\\/]|\\/([^\\/])))))))\\S+:\\d+
If the XML engine doesn't support characters classes like \\S
and \\d
then use [^ \\t\\r\\n\\p]
and [0-9]
instead.
Here is a running example: http://rubular.com/r/JnpCVgeLmL . Try changing the test string. You'll see that...
ab-proxy-sample.company.com:8080 # matches
htab-proxy-sample.company.com:8080 # matches
http://ab-proxy-sample.company.com:8080 # doesn't
https://ab-proxy-sample.company.com:8080 # doesn't
httpd://ab-proxy-sample.company.com:8080 # matches
Note that you do not need the ^
and $
. I added these specifically for the Rubular demo, but apparently the XML engine assumes this condition (anchored-ness).
How does this work? It's easier to understand if we break it up like this:
([^h] | h
([^t] | t
([^t] | t
([^p] | p
([^s:]| s ([^:]|:([^\/]|\/([^\/])))
| : ([^\/]|\/([^\/])))
))))
\S+:\d+
The explanation:
Here, it gets tricky: now we encounter three branches.
And finally, if we've gotten this far, then we look for a string of non-whitespace characters, followed by a colon, followed by a string of digits.
I leave it to a smarter mathematician than myself to ponder whether all strings matchable using lookarounds can be "brute-forced" in such a way.
To avoid matching a string starting with some word, use negative look ahead:
^(?!https?).*$
will match any strings that do not start with http(s). The other requirements are left to the reader as an exercise :-)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.