简体   繁体   English

是否有一种优雅的方式在Java中进行部分正则表达式匹配?

[英]Is there an elegant way to do partial regex matches in Java?

What I need is to check whether a given string partially matches a given regex. 我需要的是检查给定的字符串是否部分匹配给定的正则表达式。 For example, for the regex ab[0-9]c , the strings "a", "ab", "ab3", and "b3c" would "match", but not the strings "d", "abc", or "a3c". 例如,对于正则表达式ab[0-9]c ,字符串“a”,“ab”,“ab3”和“b3c”将“匹配”,但不是字符串“d”,“abc”或“A3C”。 What I've been doing is the clunky a(?:b(?:[0-9](?:c)?)?)? 我一直在做的是笨重的a(?:b(?:[0-9](?:c)?)?)? (which only works for some of the partial matches, specifically those which "begin" to match), but since this is part of an API, I'd rather give the users a more intuitive way of entering their matching regexps. (仅适用于某些部分匹配,特别是那些“开始”匹配的匹配),但由于这是API的一部分,我宁愿为用户提供更直观的方式来输入匹配的正则表达式。

In case the description's not very clear (and I realize it might not be!), this will be used for validating text input on text boxes. 如果描述不是很清楚(我意识到它可能不是!),这将用于验证文本框上的文本输入。 I want to prevent any editing that would result in an invalid string, but I can't just match the string against a regular regex, since until it's fully entered, it would not match. 我想阻止任何会导致无效字符串的编辑,但我不能将字符串与常规正则表达式匹配,因为直到它完全进入,它才会匹配。 For example, using the regex above ( ab[0-9]c ), when I attempt to enter 'a', it's disallowed, since the string "a" does not match the regex. 例如,使用上面的正则表达式( ab[0-9]c ),当我尝试输入'a'时,它被禁止,因为字符串“a”与正则表达式不匹配。

Basically, it's a sort of reverse startsWith() which works on regexps. 基本上,它是一种反向startsWith() ,它适用于regexp。 ( new Pattern("ab[0-9]c").startsWith("ab3") should return true .) new Pattern("ab[0-9]c").startsWith("ab3")应返回true 。)

Any ideas? 有任何想法吗?

Is Matcher.hitEnd() what you're looking for? Matcher.hitEnd()是您要找的?

Pattern thePattern = Pattern.compile(theRegexString);
Matcher m = thePattern.matcher(theStringToTest);
if (m.matches()) {
    return true;
}
return m.hitEnd();

Although there may be some trickery available, your way is probably the best semantically. 虽然可能有一些技巧,但你的方式可能是最好的语义。 It accurately describes what you're are looking for. 它准确地描述了您正在寻找的东西。

However, the bigger issue is whether you really need to validate every single time a character is typed into the text box. 但是,更大的问题是,您是否真的需要验证每个字符在文本框中输入的时间。 Why can't you just validate it once at the end and save yourself some headaches? 为什么你不能只在最后验证一次并节省一些麻烦?

Here is a regex that can solve your particular example: 这是一个可以解决您的特定示例的正则表达式:

^(?:a|b|[0-9]|c|ab|b[0-9]|[0-9]c|ab[0-9]|b[0-9]c|ab[0-9]c)?$

Generally speaking, if you can break the regex down into atomic parts, you can OR together all possible groupings of them, but it is big and ugly. 一般来说,如果你可以将正则表达式分解为原子部分,你可以将它们的所有可能的分组OR,但它很大而且很难看。 In this case, there were 4 parts (a, b, [0-9], and c), so you had to OR together 4+3+2+1=10 possibilities. 在这种情况下,有4个部分(a,b,[0-9]和c),所以你必须OR一起4 + 3 + 2 + 1 = 10种可能性。 (For n parts, it is ( n ×( n +1))/2 possibilities). (对于n个部分,它是( n ×( n + 1))/ 2种可能性)。 You might be able to generate this algorithmically, but it would be a huge pain to test. 您可能能够以算法方式生成此算法,但测试将是一个巨大的痛苦。 And anything complex (like a subgroup) would be very difficult to get right. 任何复杂的事情(如子组)都很难做到正确。

A better solution is probably just to have a message beside the input field telling the user "not enough info" or something, and when they have it right change it to a green checkbox or something. 一个更好的解决方案可能只是在输入字段旁边有一条消息告诉用户“没有足够的信息”或什么东西,当他们正确时将其更改为绿色复选框或其他东西。 Here's a recent article from A List Apart that weighs the pros and cons of different approaches to this problem: Inline Validation in Web Forms . 这是A List Apart最近发表的一篇文章,它重点讨论了这个问题的不同方法的优缺点: Web表单中的内联验证

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM