Regex matching URL with www and no consecutive dots

Question

Can you help me with regex?

I have line

"Sites www.google.com и www.ridd.rdd..com good."

After parse I'v get this type of line:

"Sites http://www.google.com и www.ridd.rdd..com good."

Problem with checking consecutive points. To sites with an error (with two points in a row) "http//:" should not be appended .

My regex:

 Matcher matchr = Pattern.compile("w{3}(\\.\\w+)+[a-z]{2,6}").matcher(text);

        while (matchr.find()) {
            text = text.replace(matchr.group(0), "http://" + matchr.group(0));
        }

        System.out.println(text);

Answer 1

Your regex w{3}(\\\\.\\\\w+)+[az]{2,6} matches a part of the second bad "URL", www.ridd.rdd ..com. So, you need to make sure the substring you match has no consecutive dots. You may use word boundaries and a negative lookahead (?!\\S*\\.{2}) .

Use

String text = "Sites www.google.com и www.ridd.rdd..com good.";
text = text.replaceAll("\\b(?!\\S*\\.{2})w{3}(\\.\\w+)+[a-z]{2,6}\\b", "http://$0");
// => Sites http://www.google.com и www.ridd.rdd..com good.

See the IDEONE demo

Pattern explanation:

\\\\b - leading word boundary
(?!\\\\S*\\\\.{2}) - there should not be any consecutive dots in the non-whitespace chunk to follow
w{3} - match www
(\\\\.\\\\w+)+ - 1+ sequences of . followed with 1+ alphanumeric or underscore characters
[az]{2,6} - make sure there are 2 to 6 az letters...
\\\\b - at the end of this "word"

Regex matching URL with www and no consecutive dots

Question

1 answers

solution1
1 2016-04-06 09:23:08

Regex matching URL with www and no consecutive dots

Question

1 answers

solution1 1 2016-04-06 09:23:08

solution1
1 2016-04-06 09:23:08