简体   繁体   English

正则表达式表达花费太多时间

[英]Regex expression taking too much time

I have the below regex expression in a Java code, it is taking a good deal of time to complete on some cases. 我在Java代码中具有以下正则表达式,在某些情况下需要花费大量时间才能完成。 Is there a way to improve it? 有办法改善吗?

String decimal = "([0-9]+(\\.[0-9]+)?[/-]?)+";
String units = "(in|ft)\\.?";
String unitName = "(cu\\.? *ft|gauge|watt|rpm|ft|lbs|K|GPF|btu|mph|cfm|volt|oz|pounds|dbi|miles|amp|hour|kw|f|degrees|year)";

    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units + " *x? *" + decimal + " *" + units + ""));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *x *" + decimal + " *" + units));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *" + units + ")"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *sq?\\.?)( *ft?\\.?)"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + unitName));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "(d)"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *(%|percent))"));
    sizePattern.add(Pattern.compile("(?i)" + decimal));

    for (Pattern p : sizePattern)
    {
        ODebug.Write(Level.FINER, "PRD-0079: Using pattern = " + p.pattern());

        m = p.matcher(_data);
        while (m.find()) 
        {
            ODebug.Write(Level.FINER, "           Got => [" + m.group(0) + "]");
            this.Dimensions.add(m.group(0));

            _data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");
            m = p.matcher(_data);
        }
    }

String causing the issue: 导致问题的字符串:

Micro-Induction Cooktop provides the best in cooktop performance, safety and efficiency. 微电磁炉灶具提供了最佳的灶具性能,安全性和效率。 Induction heats as electricity flows through a coil to produce a magnetic field under the ceramic plate. 电流流经线圈时感应发热,从而在陶瓷板下方产生磁场。 When a ferromagnetic cookware is placed on the ceramic surface, currents are induced in the cookware and instant heat is generated due to the resistance of the pan. 当将铁磁炊具放在陶瓷表面上时,炊具中会感应出电流,并且由于锅的电阻会立即产生热量。 Heat is generated to the pan only and no heat is lost. 锅中仅产生热量,没有热量损失。 As there are no open flames, inductions are safer to use than conventional burners. 由于没有明火,因此感应器比常规燃烧器更安全。 Once cookware is removed, all molecular activity ceases and heating is stopped immediately.Flush surface for built-in or freestanding applicationDual functions: Cook and Warm7 power settings (100-300-500-700-900-1100-1300W)* The 2 lowest power settings cannot be actually achieved, but are ""simulated"":100W = 500W intermittently heat for 2 seconds and stop for 8 seconds300W = 500W intermittently heat for 6 seconds and stop for 4 seconds13 Keep Warm settings (100-120-140-160-180-190-210-230-250-280-300-350-390F)Touch sensitive panel with control lockUp to 8 hours timerMicro-crystal ceramic plateAutomatic pan detectionLED panelETL/ETL-Sanitation/FCC certified for household or commercial useHome Depot Protection Plan: 移除炊具后,所有分子活动都会停止并立即停止加热内置或独立式应用的齐平表面双重功能:Cook and Warm7功率设置(100-300-500-700-900-1100-1300W)*最低2功率设置实际上无法实现,但会“模拟”:100W = 500W间歇加热2秒并停止8秒300W = 500W间歇加热6秒并停止4秒13保持温暖的设置(100-120-140- 160-180-190-210-230-250-280-300-350-390F)带控制锁定功能的触摸屏长达8小时的计时器微晶陶瓷板自动平底锅检测LED面板经过ETL / ETL-卫生/ FCC认证的家用或商用家用仓库保护计划:

Assuming your _data is long, it's not the matching what takes the time, but rather the assignment 假设您的_data很长,那么花费时间不是匹配,而是赋值

_data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");

which is O(n**2) in terms of the string length. 根据字符串长度为O(n**2) Just don't do it. 只是不要这样做。

You could do it simpler with 您可以使用以下方法更简单

_data = _data.replace(m.group(0), ".");

but just don't do it. 但别这么做。 Do you need a reduced _data at the end? 您最后需要减少的_data吗? If so, use a single replaceAll per pattern (it uses a StringBuffer internally and is only linear in the size of the string). 如果是这样,请为每个模式使用单个replaceAll (它在内部使用StringBuffer ,并且仅在字符串大小上呈线性关系)。

Additionally: 另外:

  • Use non-capturing groups. 使用非捕获组。
  • Consider recycling the Matcher by using reset(CharSequence) and usePattern(Pattern) . 考虑使用reset(CharSequence)usePattern(Pattern)回收Matcher
  • Consider combining all the patterns into one. 考虑将所有模式组合为一个。 As all of them start the same, it could be quite efficient. 由于它们都是相同的,所以效率可能很高。
  • Your decimal can probably get slow in case there's no match. 万一没有匹配,您的decimal可能会变慢。 Leaving out the optional part, you get "([0-9]+)+" which can backtrack needlessly a lot. 省略可选部分,您会得到"([0-9]+)+" ,它可以不必要地大量回退。 Consider using atomic groups. 考虑使用原子团。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM