正則表達式非常慢

Question

上一篇：我正嘗試使用正則表達式從大型數組中提取不同類型的parts 。 此操作在AsyncTask執行。 part.plainname是一個字符串，最多256個字符。 item_pattern看起來像"^keyword.*?$"

問題：我找到了方法，這會使一切變慢：

public int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, item_pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public boolean testItem(String testString, String item_pattern){
    Pattern p = Pattern.compile(item_pattern);
    Matcher m = p.matcher(testString);
    return m.matches();
}

只有950個parts ，但工作速度非常慢：

02-25 11:34:51.773    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2

02-25 11:35:18.094    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3

20秒僅用於計數。 testItem使用，大約有15 *個parts 。 因此，整個應用程序的工作時間超過15分鍾。 雖然幾乎相同的Java程序（不適用於android應用）在30秒內完成。

問題：我做錯了什么？ 為什么簡單的正則表達式操作要花這么長時間？

Answer 1

您可以預編譯模式：

public static int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    Pattern pattern = Pattern.compile(item_pattern);
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public static boolean testItem(String testString, Pattern pattern){
    Matcher m = pattern.matcher(testString);
    return m.matches();
}

Answer 2

如果要查找以關鍵字開頭的字符串，則無需將matches方法與這種模式^keyword.*?$ ：

首先，非貪婪的量詞是無用的，並且可能使正則表達式引擎毫無作用地變慢，貪婪的量詞會給您相同的結果。
由於默認情況下， matches方法是錨定的，因此不需要錨定，因此可以將其刪除。
您只對字符串的lookingAt感興趣，因此在這種情況下， lookingAt方法更合適，因為它並不關心字符串末尾會發生什么。
正如其他答案所指出的那樣，如果多次使用同一模式，請嘗試一次在testItem函數外部進行編譯。 但是，如果不是這種情況，則根本不進行編譯。
如果keyword是文字字符串而不是子模式，則不要使用正則表達式，而要使用indexOf檢查關鍵字是否在索引0處。

Answer 3

您無需每次都編譯模式。 而是在初始化時執行一次。

但是，由於它們的通用性，正則表達式並不是很快，而且它們也並非如此。 如果數據足夠規則，則使用特定的字符串拆分技術可能會更好。

Answer 4

正則表達式通常是緩慢的 ，因為他們有很多的參與他們建設的東西（如同步）。

不要在循環中調用單獨的方法（這可能會阻止某些優化）。 讓虛擬機優化 for循環。 使用它並檢查性能：

  Pattern p = Pattern.compile(item_pattern); // compile pattern only once for (NFItem part : parts) { if (testItem(part.plainname, item_pattern)) ++casecount; } Matcher m = p.matcher(testString); boolean b = m.matches(); ...

正則表達式非常慢

問題描述

4 個解決方案

解決方案1
1 已采納 2015-02-25 12:36:57

解決方案2
1 2015-02-25 12:56:20

解決方案3
0 2015-02-25 12:37:06

解決方案4
0 2015-02-25 12:40:12

正則表達式非常慢

問題描述

4 個解決方案

解決方案1 1 已采納 2015-02-25 12:36:57

解決方案2 1 2015-02-25 12:56:20

解決方案3 0 2015-02-25 12:37:06

解決方案4 0 2015-02-25 12:40:12

解決方案1
1 已采納 2015-02-25 12:36:57

解決方案2
1 2015-02-25 12:56:20

解決方案3
0 2015-02-25 12:37:06

解決方案4
0 2015-02-25 12:40:12