简体   繁体   English

String.contains的java正则表达式

[英]java regular expression for String.contains

I'm looking for how to create a regular expression, which is 100% equivalent to the "contains" method in the String class. 我正在寻找如何创建一个正则表达式,它与String类中的“ contains”方法100%等效。 Basically, I have thousands of phrases that I'm searching for, and from what I understand it is much better for performance reasons to compile the regular expression once and use it multiple times, vs calling "mystring.contains(testString)" over and over again on different "mystring" values, with the same testString values. 基本上,我有几千个我正在搜索的短语,而且根据我的理解,出于性能原因,编译正则表达式一次并使用它多次,比调用“mystring.contains(testString)”更好。在具有相同testString值的不同“ mystring”值上再次进行说明。

Edit: to expand on my question... I will have many thousands of "testString" values, and I don't want to have to convert those to a format that the regular expression mechanism understands. 编辑:扩展我的问题...我将有成千上万个“ testString”值,并且我不想将这些值转换为正则表达式机制可以理解的格式。 I just want to be able to directly pass in a phrase that users enter, and see if it is found in whatever value "mystring" happens to contain. 我只希望能够直接传递用户输入的短语,并查看是否在碰巧包含“ mystring”的任何值中找到它。 "testString" will not change it's value ever, but there will be thousands of them so that is why I was thinking of creating the matcher object and re-using it over and over etc. (Obviously my regexp skills are not up to snuff) “testString”不会改变它的值,但会有成千上万的这就是为什么我想创建匹配器对象并一遍又一遍地重复使用它(显然我的正则表达式技能不符合要求)

You can use the LITERAL flag when compiling your pattern to tell the engine you're using a literal string, eg: 编译模式时可以使用LITERAL标志告诉引擎你正在使用文字字符串,例如:

 Pattern p = Pattern.compile(yourString, Pattern.LITERAL);

But are you really sure that doing that and then reusing the result is faster than just String#contains ? 但是你真的确定这样做然后重用结果比String#contains更快吗? Enough to make the complexity worth it? 足以让复杂性值得吗?

Well you could use Pattern.quote to get a "piece of regular expression" for each input string. 好吧,您可以使用Pattern.quote为每个输入字符串获取一个“正则表达式”。 Do any of your terms contain line breaks? 您的任何条款是否包含换行符? If so, that could at least make life slightly trickier, though far from impossible. 如果是这样,那至少可以使生活变得有点棘手,尽管远非不可能。

Anyway, you'd basically just join the quoted terms together as: 无论如何,您基本上只需将引用的术语合并为:

Pattern pattern = Pattern.compile("quoted1|quoted2|quoted3|...");

You might want to use Guava's Joiner to easily join the quoted strings together, although obviously it's not terribly hard to do manually. 您可能希望使用Guava的Joiner轻松地将引用的字符串连接在一起,尽管显然手动操作并不是非常困难。

However, I would try this and then test whether it's actually more efficient than just calling contains . 但是,我会尝试这种方法,然后测试它实际上是否比仅调用contains更为有效。 Have you already got a benchmark which shows that contains is too slow? 你有没有一个基准测试表明contains太慢了?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM