简体   繁体   English

如何确定字符串是否不是正则表达式?

[英]How do I determine if a string is not a regular expression?

I am trying to improve the performance of some code. 我试图提高一些代码的性能。 It looks something like this: 它看起来像这样:

public boolean isImportant(String token) {
    for (Pattern pattern : patterns) {
        return pattern.matches(token).find();
    }
}

What I noticed is that many of the Patterns seem to be simple string literals with no regular expression constructs. 我注意到的是,许多模式似乎是简单的字符串文字,没有正则表达式构造。 So I want to simply store these in a separate list (importantList) and do an equality test instead of performing a more expensive pattern match, such as follows: 所以我想简单地将它们存储在一个单独的列表(importantList)中并进行相等测试,而不是执行更昂贵的模式匹配,如下所示:

public boolean isImportant(String token) {
    if (importantList.contains(token)) return true;

    for (Pattern pattern : patterns) {
        return pattern.matches(token).find();
    }        
}

How do I programmatically determine if a particular string contains no regular expression constructs? 如何以编程方式确定特定字符串是否不包含正则表达式构造?

Edit: I should add that the answer doesn't need to be performance-sensitive. 编辑:我应该补充一点,答案不需要对性能敏感。 (ie regular expressions can be used) I'm mainly concerned with the performance of isImportant() because it's called millions of times, while the initialzation of the patterns is only done once. (即可以使用正则表达式)我主要关注isImportant()的性能,因为它被调用了数百万次,而模式的初始化只进行了一次。

I normally hate answers that say this but... 我通常讨厌这样说但是......

Don't do that. 不要那样做。

It probably won't make the code run faster, in fact it might even cause the program to take more time. 它可能不会使代码运行得更快,实际上它甚至可能导致程序花费更多时间。

if you really need to optimize your code, there are likely much mush much more effective places where you can go. 如果你真的需要优化你的代码,你可以去的地方可能会有很多更有效的地方。

It's going to be difficult. 这将很难。 You can check for the non-presence of any regex metacharacters; 您可以检查是否存在任何正则表达式元字符; that should be a good approximation: 这应该是一个很好的近似值:

Pattern regex = Pattern.compile("[$^()\\[\\]{}.*+?\\\\]");
Matcher regexMatcher = regex.matcher(subjectString);
regexIsLikely = regexMatcher.find();

Whether it's worth it is another question. 是否值得这是另一个问题。 Are you sure a regex match is slower than a list lookup (especially since you'll be doing a regex match after that in many cases anyway)? 你确定正则表达式匹配比列表查找慢吗(特别是因为在很多情况下你会在那之后进行正则表达式匹配)? I'd bet it's much faster to just keep the regex match. 我敢打赌,保持正则表达式匹配要快得多。

There is no way to determine it as every regex pattern is nothing else than a string. 没有办法确定它,因为每个正则表达式都只是一个字符串。 Furthermore there is nearly no performance difference as regex is smart nowadays and I'm pretty sure, if the pattern and source lengths are the same, equity check is the first that will be done 此外几乎没有性能差异,因为正则表达式现在很聪明,而且我很确定,如果模式和源长度相同,则股权检查是第一次完成

This is wrong 这是错的

    for (Pattern pattern : patterns) 

you should create one big regex that ORs all patterns; 你应该创建一个ORs所有模式的大正则表达式; then for each input you only match once. 那么对于每个输入你只匹配一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM