在Java中使用正则表达式使用不可打印的字符

Question

I'm using regex found here ( link ) to extract domain string that works fine. 我正在使用在这里找到的正则表达式（ link ）来提取正常工作的域字符串。

the regex is 正则表达式是

^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$

I'm wondering, how could I change it in order to match domain which contains a non printable character instead of dot (.) ? 我想知道，如何更改它以匹配包含不可打印字符而不是点（。）的域？

I know that regex code are like \\x01, \\x02, etc.. but if I replace dot with one of them, the regex doesn't match anymore 我知道正则表达式代码类似于\\ x01，\\ x02等。但是，如果我用其中之一替换点，则正则表达式不再匹配

thanks in advance 提前致谢

Answer 1

Your dot is escaped here. 您的点在这里逃脱了。

You need to remove the double-escape ( \\\\ ) and replace the dot with a literal to match it. 您需要删除双转义符（ \\\\ ）并将点替换为文字以使其匹配。

You could also just remove the double escape and keep the dot, which would match any character. 您也可以删除双转义符并保留与任何字符匹配的点。

Answer 2

. 。 will match any single character regardless of whether it is printable. 将匹配任何单个字符，无论它是否可打印。 Your current group [A-Za-z0-9-] restricts it. 您当前的群组[A-Za-z0-9-]对其进行了限制。 You could change this to "any character except literal dot"... ie [^.]. 您可以将其更改为“除文字点以外的任何字符” ...即[^。]。

Pattern regex = Pattern.compile("^((?!-)[^.]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$");
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004..com").find()); // => false
System.out.println(regex.matcher("\u0001\u0002\u0003\u0004.com").find()); // => true
System.out.println(regex.matcher("google.com").find()); // => true

If you're attempting to validate user entry of IDNs (international domain names), note note that there are new gTLDs that contain non alphanumeric characters Example .شبكة (.network). 如果您尝试验证IDN（国际域名）的用户输入，请注意，有一些新的gTLD包含非字母数字字符，例如.شبكة（.network）。

在Java中使用正则表达式使用不可打印的字符

问题描述

2 个解决方案

解决方案1
0 2015-08-12 10:55:13

解决方案2
0 已采纳 2015-08-12 11:07:50

在Java中使用正则表达式使用不可打印的字符

问题描述

2 个解决方案

解决方案1 0 2015-08-12 10:55:13

解决方案2 0 已采纳 2015-08-12 11:07:50

解决方案1
0 2015-08-12 10:55:13

解决方案2
0 已采纳 2015-08-12 11:07:50