Can you help me with regex?
I have line
"Sites www.google.com и www.ridd.rdd..com good."
After parse I'v get this type of line:
"Sites http://www.google.com и www.ridd.rdd..com good."
Problem with checking consecutive points. To sites with an error (with two points in a row) "http//:"
should not be appended .
My regex:
Matcher matchr = Pattern.compile("w{3}(\\.\\w+)+[a-z]{2,6}").matcher(text);
while (matchr.find()) {
text = text.replace(matchr.group(0), "http://" + matchr.group(0));
}
System.out.println(text);
Your regex w{3}(\\\\.\\\\w+)+[az]{2,6}
matches a part of the second bad "URL", www.ridd.rdd ..com. So, you need to make sure the substring you match has no consecutive dots. You may use word boundaries and a negative lookahead (?!\\S*\\.{2})
.
Use
String text = "Sites www.google.com и www.ridd.rdd..com good.";
text = text.replaceAll("\\b(?!\\S*\\.{2})w{3}(\\.\\w+)+[a-z]{2,6}\\b", "http://$0");
// => Sites http://www.google.com и www.ridd.rdd..com good.
See the IDEONE demo
Pattern explanation:
\\\\b
- leading word boundary (?!\\\\S*\\\\.{2})
- there should not be any consecutive dots in the non-whitespace chunk to follow w{3}
- match www
(\\\\.\\\\w+)+
- 1+ sequences of .
followed with 1+ alphanumeric or underscore characters [az]{2,6}
- make sure there are 2 to 6 az
letters... \\\\b
- at the end of this "word"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.