简体   繁体   English

如何在不使用Java创建对象的情况下匹配正则表达式?

[英]How to match regular expression without creating objects in Java?

I'm working on a reg expression match function. 我正在研究reg表达式匹配功能。 The problem is, this function will be called by the framework inside a nested loop. 问题是,该函数将由嵌套循环内的框架调用。 If temporary objects are created, GC will cause very big performance problem. 如果创建了临时对象,GC将导致很大的性能问题。

Is it possible to deal with regexp things without create temp objects (Pattern,Matcher)? 是否可以在不创建临时对象(Pattern,Matcher)的情况下处理正则表达式? Rewrite regexp classes is my last choice... 重写regexp类是我的最后选择。

Your best bet is to deal with the issues as and when they arise - which they probably won't. 最好的选择是在问题出现时及时处理-可能不会。 Performance problems around GC'ing large numbers of small lived objects was a problem around a decade ago, but now it's incredibly good at it. 大约10年前,围绕GC处理大量小型活动对象的性能问题是一个问题,但现在已经非常出色了。

If you do need to optimise then this should be in the form of changing the GC options - the size of the young generation for instance, and not trying to optimise in code. 如果确实需要优化,则应采用更改GC选项的形式-例如,年轻一代的大小,而不要尝试在代码中进行优化。

Matcher对象不是线程安全的,因此除非调用reset()方法(在单个线程中应该可以正常工作),否则您将无法重用它们-请参见Java Regex线程安全吗?

To quote an old saying: 引用一句老话:

Make it work, make it right, make it fast. 使它起作用,使其正确,使其快速。 (in that order) (以该顺序)

So before going down any heavy optomization steps, just write the initial straightforward appropriate code (which in this case would involve pre-compiling your patterns if you can). 因此,在执行任何繁琐的优化步骤之前,只需编写初始的,简单易用的适当代码(在这种情况下,如果可能的话,将涉及预编译模式)。 Run some tests and see if the performance is inadequate, and then optimize if the regex portion is a bottleneck. 运行一些测试,看看性能是否不足,然后优化正则表达式部分是否成为瓶颈。

If the object creation (and cleanup) is a serious bottleneck (as compared to the actual regex parsing itself), then you may need to implement your own solution that uses an object pool (so objects are not created, just reset and reused from the pool). 如果对象的创建(和清除)是一个严重的瓶颈(与实际的正则表达式本身相比),那么您可能需要实现使用对象池的自己的解决方案(因此不创建对象,只需从对象中重置并重新使用)池)。 I doubt that this will result in any serious performance gains though, so you should benchmark first just to see how much gain is even possible (if you improve object creation / cleanup performance by 50%, would it be worth it?). 我怀疑这是否会导致任何严重的性能提升,因此您应该首先进行基准测试,以查看甚至有多少增益(如果将对象创建/清理性能提高50%,是否值得?)。

This sounds like premature optimization. 这听起来像过早的优化。

Write the most straightforward code you can, then profile it in a realistic setting, and see whether there are any problems with performance or memory allocation patterns. 编写最简单的代码,然后在实际设置中对其进行分析,并查看性能或内存分配模式是否存在问题。 If there are, address the specific issues you've uncovered. 如果有,请解决您发现的特定问题。

Modern JVMs are incredibly good at garbage collecting short-lived objects. 现代JVM非常擅长于垃圾收集短期对象。

You can precompile your regexes which makes sense if you reuse the same regex multiple times. 您可以预编译正则表达式,如果您多次重复使用同一个正则表达式,则很有用。

Instead of 代替

boolean foundMatch = subjectString.matches("a.*b");

(where a temporary compiled Pattern will be created anyway), you can use (无论如何都会创建一个临时的已编译Pattern ),您可以使用

Pattern regex = Pattern.compile("a.*b");
// loop here
// do something...
    Matcher regexMatcher = regex.matcher(subjectString);
    boolean foundMatch = regexMatcher.matches()
// loop end

Hard to say if there will be any relevant performance benefit, though. 不过,很难说是否会有任何相关的性能优势。

Seems like you are doing an early optimization, which in your case will most probably be useless. 似乎您正在进行早期优化,对您而言,这很可能是无用的。 How about you run some tests and decide? 您如何进行测试并做出决定?

Cheers,Eugene 欢呼,尤金

the way i see it you have 2 viable options: 我认为它有2种可行的选择:

  1. Write your own logic for matching regex by looking at the source code: Pattern and Matcher . 通过查看源代码PatternMatcher来编写自己的用于匹配正则表达式的逻辑。
  2. explicitly initiate a collection of those objects when you are done with them by running their corresponding finalize() function instead of waiting for the GC to run it. 在处理完这些对象后,可以通过运行它们相应的finalize()函数来显式初始化这些对象的集合,而不是等待GC运行它。

Pros and Cons 利弊

  1. A lot of work, needs to be tested and maintained in the future, however you get full control what you are trying to do. 将来需要测试和维护许多工作,但是您可以完全控制要执行的操作。

  2. It's not recommended to interfere in the workings of the GC, clean solution and simple solution 不建议干扰GC的运行,干净的解决方案和简单的解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM