简体   繁体   English

正则表达式仅匹配字母和数字

[英]Regex to match only letters and numbers

Can you help with this code? 您可以提供此代码帮助吗?

It seems easy, but always fails. 看起来很简单,但总是失败。

@Test
public void normalizeString(){
    StringBuilder ret =  new StringBuilder();
    //Matcher matches = Pattern.compile( "([A-Z0-9])" ).matcher("P-12345678-P");
    Matcher matches = Pattern.compile( "([\\w])" ).matcher("P-12345678-P");
    for (int i = 1; i < matches.groupCount(); i++)
        ret.append(matches.group(i));

    assertEquals("P12345678P", ret.toString());
}

Constructing a Matcher does not automatically perform any matching. 构造Matcher器不会自动执行任何匹配。 That's in part because Matcher supports two distinct matching behaviors, differing in whether the match is implicitly anchored to the beginning of the Matcher 's region. 部分原因是Matcher支持两种不同的匹配行为,区别在于匹配是否隐式地锚定到Matcher区域的开头。 It appears that you could achieve your desired result like so: 看来您可以达到所需的结果,如下所示:

@Test
public void normalizeString(){
    StringBuilder ret =  new StringBuilder();
    Matcher matches = Pattern.compile( "[A-Z0-9]+" ).matcher("P-12345678-P");

    while (matches.find()) {
        ret.append(matches.group());
    }

    assertEquals("P12345678P", ret.toString());
}

Note in particular the invocation of Matcher.find() , which was a key omission from your version. 特别要注意Matcher.find()的调用,这是您的版本中的一个关键遗漏。 Also, the nullary Matcher.group() returns the substring matched by the last find() . 同样,无效的Matcher.group()返回与最后一个find()匹配的子字符串。

Furthermore, although your use of Matcher.groupCount() isn't exactly wrong, it does lead me suspect that you have the wrong idea about what it does. 此外,尽管您对Matcher.groupCount()使用并不完全错误,但它的确使我怀疑您对它的用途有错误的认识。 In particular, in your code it will always return 1 -- it inquires about the pattern , not about matches to it. 特别是,在您的代码中,它将始终返回1 -它查询模式 ,而不是与其匹配。

First of all you don't need to add any group because entire match can be always accessed by group 0, so instead of 首先,您不需要添加任何组,因为整个匹配始终可以由组0访问,因此

  • (regex) and group(1) (regex)group(1)

you can use 您可以使用

  • regex and group(0) regexgroup(0)

Next thing is that \\\\w is already character class so you don't need to surround it with another [ ] , because it will be similar to [[az]] which is same as [az] . 接下来的事情是\\\\w已经是字符类,因此您不需要用另一个[ ]包围它,因为它类似于[[az]] ,与[az]相同。

Now in your 现在在你的

for (int i = 1; i < matches.groupCount(); i++)
    ret.append(matches.group(i));

you will iterate over all groups from 1 but you will exclude last group, because they are indexed from 1 so n so i<n will not include n . 您将遍历所有组1 ,但你会排除最后一组,因为它们是从索引1所以n所以i<n将不包括n You would need to use i <= matches.groupCount() instead. 您将需要使用i <= matches.groupCount()代替。

Also it looks like you are confusing something. 同样,您似乎也感到困惑。 This loop will not find all matches of regex in input. 此循环不会在输入中找到所有正则表达式匹配项。 Such loop is used to iterate over groups in used regex after match for regex was found . 找到 regex的匹配后 ,使用这样的循环遍历使用过的regex中的组。

So if regex would be something like (\\w(\\w))c and your match would be like abc then 因此,如果正则表达式是(\\w(\\w))c而您的匹配项是abc那么

for (int i = 1; i < matches.groupCount(); i++)
    System.out.println(matches.group(i));

would print 会打印

ab
b

because 因为

  • first group contains two characters (\\w(\\w)) before c 第一组在c之前包含两个字符(\\w(\\w))
  • second group is the one inside first one, right after first character. 第二组是第一个字符的内部,紧接在第一个字符之后。

But to print them you actually would need to first let regex engine iterate over your input and find() match, or check if entire input matches() regex, otherwise you would get IllegalStateException because regex engine can't know from which match you want to get your groups (there can be many matches of regex in input). 但是要打印它们,您实际上首先需要让正则表达式引擎遍历您的输入和find()匹配项,或者检查整个输入是否matches() regex,否则您将收到IllegalStateException因为正则表达式引擎无法从您想要的匹配项中得知获取您的组(输入中可能有很多正则表达式匹配项)。

So what you may want to use is something like 所以您可能想要使用的是

StringBuilder ret =  new StringBuilder();
Matcher matches = Pattern.compile( "[A-Z0-9]" ).matcher("P-12345678-P");
while (matches.find()){//find next match
    ret.append(matches.group(0));
}
assertEquals("P12345678P", ret.toString());

Other way around (and probably simpler solution) would be actually removing all characters you don't want from your input. 其他方法(可能是更简单的解决方案)实际上是从输入中删除所有不需要的字符。 So you could just use replaceAll and negated character class [^...] like 因此,您可以只使用replaceAll和否定的字符类[^...]

String input = "P-12345678-P";
String result = input.replaceAll("[^A-Z0-9]+", "");

which will produce new string in which all characters which are not A-Z0-9 will be removed (replaced with "" ). 它将产生一个新字符串,其中所有非A-Z0-9字符都将被删除(替换为"" )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM