简体   繁体   English

为什么在Java中使用正则表达式会在运行时抛出“未封闭的字符类”异常?

[英]Why does this use, in Java, of regular expressions throw an “Unclosed character class” exception at runtime?

I have list of keywords: 我有关键字列表:

String[] keywords = {"xxxx", "yyyy", "zzzz"};
String[] another = {"aaa", "bbb", "ccc"};

I am trying to identify text that has one of the keywords followed by a space and then followed by one of the "another" words. 我正在尝试识别文本,该文本的关键词之一是一个空格,然后是一个“另一个”单词。

if I use: 如果我使用:

Pattern pattern = Pattern.compile(keywords+"\\s"+another);

This throws an exception at runtime: 这会在运行时引发异常:

Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 57
[Ljava.lang.String;@3dd4ab05\s[Ljava.lang.String;@5527f4f9
                                                         ^

How can I fix this? 我怎样才能解决这个问题?

That error is correctly telling you that the pattern you're trying to create is invalid. 该错误正确地告诉您您要创建的模式无效。 The gibberish looking string starting with [Ljava is the string you passed to Pattern.compile() . [Ljava开头的乱七八糟的字符串是您传递给Pattern.compile()的字符串。

Java Arrays unfortunately do not have very informative .toString() output, and what you're doing here is essentially concatenating two arrays as Strings, which Pattern cannot hope to parse correctly. 不幸的是,Java数组没有非常有用的.toString()输出,您在这里所做的实际上是将两个数组串联为字符串, Pattern无法希望正确解析。

But even if you called Arrays.toString() , you'd still not get what you're looking for: 但是,即使您调用了Arrays.toString() ,也仍然无法获得所需的内容:

Pattern pattern=Pattern.compile(Arrays.toString(keywords)+"\\s"+
                                Arrays.toString(another));
System.out.println(pattern.pattern());
 [xxxx, yyyy, zzzz]\\s[aaa, bbb, ccc] 

This is a technically valid, but essentially meaningless regular expression, which will only match three-character Strings starting with one character from xyz , followed by one whitespace character, followed by one character from abc , . 这是一种技术上有效的,但从本质上讲毫无意义的正则表达式,它将仅匹配三个字符的字符串,这些字符串以xyz ,一个字符开头xyz ,然后是一个空白字符,然后是abc ,一个字符。

I would suggest reading more about how regular expressions work; 我建议阅读更多有关正则表达式如何工作的信息。 there's lots of resources online to help, and a good starting point is the Java Regular Expressions lesson , and the Pattern documentation - you won't get very far until you understand what regular expressions are trying to do. 在线上有很多资源可以提供帮助,而Java正则表达式课程Pattern文档是一个很好的起点-在您了解正则表达式要做什么之前,您不会走得太远。

As a starting point however, a regular expression that matches one of several words, followed by a space, followed by one of several other words, might look like this: 但是,作为起点,匹配几个单词之一,后跟一个空格,然后是几个其他单词之一的正则表达式可能看起来像这样:

(?:xxxx|yyyy|zzzz)\s(?:aaa|bbb|ccc)

This uses "non-capturing groups" and the logical OR operator | 这使用“非捕获组”和逻辑OR运算符| to specify multiple potential matches. 指定多个潜在的匹配项。

[Ljava.lang.String;@3dd4ab05 is the result of calling toString() on a string array. [Ljava.lang.String;@3dd4ab05是在字符串数组上调用toString()的结果。

You need to build your pattern manually with the items that are in the relevant arrays. 您需要使用相关数组中的项目手动构建模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM