[英]Regex first character not matching
I am having some Java Pattern problems. 我遇到一些Java模式问题。 This is my pattern: 这是我的模式:
"^[\\p{L}\\p{Digit}~._-]+$"
It matches any letter of the US-ASCII, numerals, some special characters, basically anything that wouldn't scramble an URL. 它匹配任何US-ASCII字母,数字,一些特殊字符,基本上是任何不会加扰URL的内容。
What I would like is to find the first letter in a word that does not match this pattern. 我想在单词中找到与该模式不匹配的第一个字母。 Basically the user sends a text as an input and I have to validate it and to throw an exception if I find an illegal character. 基本上,用户发送文本作为输入,如果发现非法字符,我必须对其进行验证并抛出异常。
I tried negating this pattern, but it wouldn't compile properly. 我尝试否定此模式,但无法正确编译。 Also find()
didn't help out much. 另外find()
并没有太大帮助。
A legal input would be hello
while ?hello
should not be, and my exception should point out that ?
合法的输入将是hello
而?hello
则不是,我的例外应该指出?
is not proper. 是不合适的。
I would prefer a suggestion using Java's Matcher, Pattern or something using util.regex
. 我希望使用Java的Matcher,Pattern或使用util.regex
。 Its not a necessity, but checking each character in the string individually is not a solution. 这不是必需的,但是单独检查字符串中的每个字符不是解决方案。
Edit: I came up with a better regex to match unreserved URI characters 编辑:我想出了一个更好的正则表达式来匹配未保留的URI字符
Try this : 尝试这个 :
^[\\p{L}\\p{Digit}.'-.'_]*([^\\p{L}\\p{Digit}.'-.'_]).*$
The first character non matching is the group n°1 第一个不匹配的字符是组n°1
I made a few try here : http://fiddle.re/gkkzm6 1 我在这里做了一些尝试: http : //fiddle.re/gkkzm6 1
Explanation : 说明:
I negate your pattern, so i built this : 我否定了您的模式,所以我建立了这个:
[^\\p{L}\\p{Digit}.'-.'_] [^...] means every character except for
^ ^ the following ones.
| your pattern inside |
The pattern has 3 parts : 模式包括3个部分:
^[\\p{L}\\p{Digit}.'-.'_]*
Checks the regex from the first character until he meets a non matching character 从第一个字符开始检查正则表达式,直到遇到不匹配的字符
([^\\p{L}\\p{Digit}.'-.'_])
The non-matching character (negation) inside a capturing group 捕获组内的不匹配字符(否定)
.*$
Any character until the end of the string. 字符串末尾之前的任何字符。
Hope it helps you 希望对您有帮助
EDIT : 编辑:
The correct regex shoud be : 正确的正则表达式应为:
^[\\p{L}\\p{Digit}~._-]*([^\\p{L}\\p{Digit}~._-]).*$
It is the same method, i only change the contents of the first and second part. 这是相同的方法,我只更改第一部分和第二部分的内容。
I tried and it seems to work. 我尝试了,它似乎有效。
Try out this one to find the first non valid char: 试试这个,找到第一个无效的字符:
Pattern negPattern = Pattern.compile(".*?([^\\p{L}^\\p{Digit}^.^'-.'^_]+).*");
Matcher matcher = negPattern.matcher("hel?lo");
if (matcher.matches())
{
System.out.println("'" + matcher.group(1).charAt(0) + "'");
}
The "^[\\\\p{L}\\\\p{Digit}.'-.'_]+$"
pattern matches any string containing 1+ characters defined inside the character class. "^[\\\\p{L}\\\\p{Digit}.'-.'_]+$"
模式与在字符类中定义的包含1个以上字符的任何字符串匹配。 Note that double '
and .
注意double '
和.
are suspicious and you might be unaware of the fact that '-.
是可疑的,您可能没有意识到'-.
creates a range and matches '()*+,-.
创建一个范围并匹配'()*+,-.
. 。 If it is not on purpose, I think you meant to use .'_-
. 如果不是故意的,我认为您打算使用.'_-
。
To check if a string starts with a character other than the one defined in the character class, you can negated the character class, and check the first character in the string only: 要检查字符串是否以字符类中定义的字符以外的其他字符开头,可以否定字符类,仅检查字符串中的第一个字符:
if (str.matches("[^\\p{L}\\p{Digit}.'_-].*")) {
/* String starts with the disallowed character */
}
I also think you can shorten the regex to "(?U)[^\\\\w.'-].*"
. 我还认为您可以将正则表达式缩短为"(?U)[^\\\\w.'-].*"
。 At any rate, \\\\p{Digit}
can be replaced with \\\\d
. 无论如何, \\\\p{Digit}
都可以替换为\\\\d
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.