简体   繁体   English

为什么这个Java正则表达式不起作用?

[英]Why doesn't this Java regular expression work?

I need to create a regular expression that allows a string to contain any number of: 我需要创建一个允许字符串包含任意数量的正则表达式:

  • alphanumeric characters 字母数字字符
  • spaces 空间
  • (
  • )
  • &
  • .

No other characters are permitted. 不允许使用其他字符。 I used RegexBuddy to construct the following regex, which works correctly when I test it within RegexBuddy: 我使用RegexBuddy来构造以下正则表达式,当我在RegexBuddy中测试它时,它正常工作:

\w* *\(*\)*&*\.*

Then I used RegexBuddy's "Use" feature to convert this into Java code, but it doesn't appear to work correctly using a simple test program: 然后我使用RegexBuddy的“使用”功能将其转换为Java代码,但使用简单的测试程序似乎无法正常工作:

public class RegexTest
{
  public static void main(String[] args)
  {
    String test = "(AT) & (T)."; // Should be valid
    System.out.println("Test string matches: "
      + test.matches("\\w* *\\(*\\)*&*\\.*")); // Outputs false
  }
}
  • I must admit that I have a bit of a blind spot when it comes to regular expressions. 我必须承认,在正则表达式方面我有点盲点。 Can anyone explain why it doesn't work please? 任何人都可以解释为什么它不起作用吗?

That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods. 正则表达式测试任何数量的空格,后跟任意数量的字母数字字符,然后是任意数量的空白数据,接着是任意数量的紧密数据,然后是任意数量的&符号,然后是任意数量的句点。

What you want is... 你想要的是......

test.matches("[\\w \\(\\)&\\.]*")

As mentioned by mmyers, this allows the empty string. 正如mmyers所提到的,这允许空字符串。 If you do not want to allow the empty string... 如果你不想允许空字符串......

test.matches("[\\w \\(\\)&\\.]+")

Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character... 虽然这也允许一个只有空格的字符串,或者只包含句点等。如果你想确保至少有一个字母数字字符......

test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")

So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. 所以你理解正则表达式的含义......方括号内的任何内容(“[]”)表示一组字符。 So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's. 因此,“a *”表示0或更多a,[abc] *表示0或更多字符,所有字符都是a,b或c。

Maybe I'm misunderstanding your description, but aren't you essentially defining a class of characters without an order rather than a specific sequence? 也许我误解了你的描述,但是你不是在没有订单而不是特定的序列来定义一类字符吗? Shouldn't your regexp have a structure of [xxxx]+, where xxxx are the actual characters you want ? 你的regexp不应该有[xxxx] +的结构,其中xxxx是你想要的实际字符吗?

The difference between your Java code snippet and the Test tab in RegexBuddy is that the matches() method in Java requires the regular expression to match the whole string, while the Test tab in RegexBuddy allows partial matches. 您的Java代码片段和RegexBuddy中的Test选项卡之间的区别在于Java中的matches()方法要求正则表达式匹配整个字符串,而RegexBuddy中的Test选项卡允许部分匹配。 If you use your original regex in RegexBuddy, you'll see multiple blocks of yellow and blue highlighting. 如果您在RegexBuddy中使用原始正则表达式,您将看到多个黄色和蓝色突出显示的块。 That indicates RegexBuddy found multiple partial matches in your string. 这表明RegexBuddy在你的字符串中发现了多个部分匹配。 To get a regex that works as intended with matches(), you need to edit it until the whole test subject is highlighted in yellow, or if you turn off highlighting, until the Find First button selects the whole text. 要获得与match()一致的正则表达式,您需要对其进行编辑,直到整个测试主题以黄色突出显示,或者如果您关闭突出显示,直到“查找第一个”按钮选择整个文本。

Alternatively, you can use the anchors \\A and \\Z at the start and the end of your regex to force it to match the whole string. 或者,您可以在正则表达式的开头和结尾使用锚点\\ A和\\ Z来强制它匹配整个字符串。 When you do that, your regex always behaves in the same way, whether you test it in RegexBuddy, or whether you use matches() or another method in Java. 当你这样做时,你的正则表达式总是以相同的方式运行,无论你是在RegexBuddy中测试它,还是在Java中使用matches()或其他方法。 Only matches() requires a full string match. 只有matches()需要完整的字符串匹配。 All other Matcher methods in Java allow partial matches. Java中的所有其他Matcher方法都允许部分匹配。

the regex 正则表达式

\w* *\(*\)*&*\.*

will give you the items you described, but only in the order you described, and each one can be as many as wanted. 将为您提供您描述的项目,但仅按您所描述的顺序,每个项目可以按照您想要的数量。 So "skjhsklasdkjgsh((((())))))&&&&&....." works, but not mixing the characters. 所以“skjhsklasdkjgsh((((())))))&&&&& .....”工作,但不混合字符。

You want a regex like this: 你想要这样的正则表达式:

\[\w\(\)\&\.]+\

which will allow a mix of all characters. 这将允许所有角色的混合。

edit: my regex knowledge is limited, so the above syntax may not be perfect. 编辑:我的正则表达式知识是有限的,所以上面的语法可能不完美。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM