简体   繁体   中英

use regex in java with non printable chars

I'm using regex found here ( link ) to extract domain string that works fine.

the regex is

^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$

I'm wondering, how could I change it in order to match domain which contains a non printable character instead of dot (.) ?

I know that regex code are like \\x01, \\x02, etc.. but if I replace dot with one of them, the regex doesn't match anymore

thanks in advance

Your dot is escaped here.

You need to remove the double-escape ( \\\\ ) and replace the dot with a literal to match it.

You could also just remove the double escape and keep the dot, which would match any character.

. will match any single character regardless of whether it is printable. Your current group [A-Za-z0-9-] restricts it. You could change this to "any character except literal dot"... ie [^.].

Pattern regex = Pattern.compile("^((?!-)[^.]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$");
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004..com").find()); // => false
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004.com").find()); // => true
System.out.println(regex.matcher("google.com").find()); // => true

If you're attempting to validate user entry of IDNs (international domain names), note note that there are new gTLDs that contain non alphanumeric characters Example .شبكة (.network).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM