Validate proxy URL using XML regex pattern

Question

I am using the XML regex pattern to match my proxy URL.

eg: Proxy : ab-proxy-sample.company.com:8080

My requirement :

Should not begin with http:// OR https:// (Match the whole word)
Should accept any string + a port
Should accept even strings starting with ht

My current XML regex is : [^http://|https://].+:[0-9]+|

But its matching each letter instead of the whole word ?

Any help would be highly appreciated. Thanks in advance !

Answer 1

As @arnep points out, you're attempting to use a negated character class with alternation , which isn't the way it works. Also, here is some information regarding lookaheads .

I'm sure someone else will post an answer you can copy and paste, but this is a useful opportunity to learn the basics of regex!

UPDATE:

I didn't realize that you were using an engine that doesn't support negative lookarounds. Without negative lookarounds, it's nearly impossible to achieve what you're trying to do.

Nearly ;)

Here is a "brute-force" combinatoric method of doing it:

(?:[^h]|h(?:[^t]|t(?:[^t]|t(?:[^p]|p(?:[^s:]|s(?:[^:]|:(?:[^\/]|\/(?:[^\/])))|:(?:[^\/]|\/(?:[^\/])))))))\S+:\d+

If the XML engine doesn't support non-captured groups, ie (?: ... ) then use regular groups instead:

 ([^h]|h([^t]|t([^t]|t([^p]|p([^s:]|s([^:]|:([^\\/]|\\/([^\\/])))|:([^\\/]|\\/([^\\/])))))))\\S+:\\d+

If the XML engine doesn't support characters classes like \\S and \\d then use [^ \\t\\r\\n\\p] and [0-9] instead.

Here is a running example: http://rubular.com/r/JnpCVgeLmL . Try changing the test string. You'll see that...

    ab-proxy-sample.company.com:8080          # matches
    htab-proxy-sample.company.com:8080        # matches
    http://ab-proxy-sample.company.com:8080   # doesn't
    https://ab-proxy-sample.company.com:8080  # doesn't
    httpd://ab-proxy-sample.company.com:8080  # matches

Note that you do not need the ^ and $ . I added these specifically for the Rubular demo, but apparently the XML engine assumes this condition (anchored-ness).

How does this work? It's easier to understand if we break it up like this:

    ([^h] | h
    ([^t] | t
    ([^t] | t
    ([^p] | p
    ([^s:]| s ([^:]|:([^\/]|\/([^\/])))
          | :        ([^\/]|\/([^\/])))
    ))))
    \S+:\d+

The explanation:

If the first char isn't an "h", then great! (The string can't possibly be "http://" or "https://".)
If the first char is an "h" though, then:
1. If the second char isn't a "t", then great! (The string can't possibly be "http://" or "https://".)
2. If the second char is a "t" though, then:
  1. ... isn't "t", great!
  2. ... is "t", then:
    1. ... isn't "p", great!
    2. ... is "p", then:

Here, it gets tricky: now we encounter three branches.

If the fifth char isn't an "s" nor a ":", then great!
If the fifth char is an "s" though, then:
1. If the sixth char isn't a ":", then great!
2. If the sixth char is a ":" though, then:
  1. If the seventh char isn't a "/", then great!
  2. If the seventh char is a "/" though, then:
    1. If the eighth char isn't a "/", then great!
    2. Otherwise, fail! We found an "https://".
If the fifth char is a ":" though, then:
1. If the sixth char isn't a "/", then great!
2. If the sixth char is a "/" though, then:
  1. If the seventh char isn't a "/", then great!
  2. Otherwise, fail! We found an "http://".

And finally, if we've gotten this far, then we look for a string of non-whitespace characters, followed by a colon, followed by a string of digits.

I leave it to a smarter mathematician than myself to ponder whether all strings matchable using lookarounds can be "brute-forced" in such a way.

Answer 2

To avoid matching a string starting with some word, use negative look ahead:

^(?!https?).*$

will match any strings that do not start with http(s). The other requirements are left to the reader as an exercise :-)

Validate proxy URL using XML regex pattern

Question

2 answers

solution1
1 ACCPTED 2012-06-22 14:34:36

solution2
0 2012-06-25 08:34:24

Validate proxy URL using XML regex pattern

Question

2 answers

solution1 1 ACCPTED 2012-06-22 14:34:36

solution2 0 2012-06-25 08:34:24

solution1
1 ACCPTED 2012-06-22 14:34:36

solution2
0 2012-06-25 08:34:24