Java正则表达式中的堆栈溢出

Question

I am new in java. 我是java的新手。 I am getting java Stack overflow Exception in regex strHindiText. 我在regex strHindiText中得到java Stack overflow Exception。 What should I do for that? 我该怎么办？

try {
     // This regex convert the pattern "{\fldrslt {\fcs1 \ab\af24 \fcs0 &#2345;}{"
     // into "{\fldrslt {\fcs1 \ab\af24 \fcs0 &#2345;}}}{"
     // strHindiText = strHindiText.replaceAll("\\{(\\\\fldrslt[ ])\\{((\\\\\\S+[ ])+)((\\s*&#\\d+;\\s*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*)+)\\}\\{","{$1{$2$4}}}{");

     // This regex convert the pattern "{\fcs0 \af0 &#2345;{ or {\fcs0 \af0 *\tab &#2345;{" 
     // into "{\fcs0 \af0 &#2345; }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*((\\\\\\S+[ ](\\*)?)+\\s*)(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)\\{", "{$1 $4$5 }{");

     // This regex convert the pattern "{&#2345; \fcs0 \af0 {" 
     // into "{&#2345; \fcs0 \af0 }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)[ ]*((\\\\\\S+[ ])+)\\{", "{$1 $5 }{");

     } catch(StackOverflowError er) {
            System.out.println("Third try Block StackOverflowError in regex pattern to reform the rtf tags................");
            er.printStackTrace();
        //  throw er;
     }

Whenever these strHindiText contain large data it gives an java stackoverflow exception: 每当这些strHindiText包含大数据时，它就会产生一个java stackoverflow异常：

java.lang.StackOverflowError
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3754)
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)

My strHindiText data is: 我的strHindiText数据是：

 `{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584 &#2349;&#2379;&#2346;&#2366;&#2354;&#32; &#2404; \par }\pard\plain \ltrpar\s16\ql \li0\ri0\sb100\sa100\sbauto1\saauto1\sl240\slmult0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0\pararsid13505584 \cbpat20 \rtlch\fcs1 \af0\afs24\alang1025 \ltrch\fcs0 \fs24\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\rtlch\fcs1 \ab\af1\afs18 \ltrch\fcs0 \cs21\b\f1\fs18\cf21\insrsid13505584 &#2309;&#2344;&#2381;&#2357;&#2375;&#2359;&#2339;&#32;&#2325;&#2352;&#2375;&#2306;&#32; :}{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584  \par &#2349;&#2379;&#2346;&#2366;&#2354;&#32;&#44;&#32;&#2350;&#2343;&#2381;&#2351;&#32;&#2346;&#2381;&#2352;&#2342;&#2375;&#2358;&#32;&#2325;&#2368;&#32;&#2352;&#2366;&#2332;&#2343;&#2366;&#2344;&#2368;&#32;&#2346;&#2381;&#2352;&#2366;&#2325;&#2371;&#2340;&#2367;&#2325;&#32;&#2360;&#2369;&#2306;&#2342`

Answer 1

Option 1 - Treat the symptoms 选项1 - 治疗症状

Look for recursive calls in your regex. 在你的正则表达式中寻找递归调用。

If you are not sure where your problem lies: try a regex tester like this . 如果你不确定问题出在哪里：试试这样的正则表达式测试器。

Option 2 - Treat the cause (much better) 选项2 - 对待原因（更好）

Don't use a regex if there are better tools for your task. 如果有更好的工具可用于您的任务，请不要使用正则表达式 。

In your case you could: Search for a RTF parsing library or write your own parser. 在您的情况下，您可以：搜索RTF解析库或编写自己的解析器。
eg like the one here that jahroy pointed out in the comments. 例如，像一个在这里说jahroy在评论中指出。

Answer 2

This is not a full answer but just for your information. 这不是一个完整的答案，只是为了您的信息。

In your regex: 在你的正则表达式：

(-|,|/|\\\\(|\\\\)|\\"|;|\\\\.|'|<|>|:|\\\\?)* can be written as [-,/()\\";.'<>:?]* (-|,|/|\\\\(|\\\\)|\\"|;|\\\\.|'|<|>|:|\\\\?)*可以写成[-,/()\\";.'<>:?]*

Since this pattern occurs twice (in your first regex), this immediately shortens your regex by 40 characters and makes those sections much more readable. 由于此模式出现两次（在您的第一个正则表达式中），这会立即将正则表达式缩短40个字符，并使这些部分更具可读性。

Answer 3

Try this to catch the error 试试这个来捕捉错误

public class Example {
    public static void endless() {
        endless();
    }

    public static void main(String args[]) {
        try {
            endless();
        } catch(StackOverflowError t) {
            // more general: catch(Error t)
            // anything: catch(Throwable t)
            System.out.println("Caught "+t);
            t.printStackTrace();
        }
        System.out.println("After the error...");
    }
}

More importantly try increasing the size of the stack add this to your regex 更重要的是尝试增加堆栈的大小，将其添加到正则表达式中

+'xss='xss

adding the "+" symbol changes the operator to prevent back tracking since this doesnt seem to be necessary in your case. 添加“+”符号会更改操作符以防止回溯，因为在您的情况下这似乎不是必需的。

Java正则表达式中的堆栈溢出

问题描述

3 个解决方案

解决方案1
3 2013-11-06 13:02:38

Option 1 - Treat the symptoms 选项1 - 治疗症状

Option 2 - Treat the cause (much better) 选项2 - 对待原因（更好）

解决方案2
1 2013-11-27 11:33:52

解决方案3
0 2013-08-09 02:45:47

Java正则表达式中的堆栈溢出

问题描述

3 个解决方案

解决方案1 3 2013-11-06 13:02:38

Option 1 - Treat the symptoms 选项1 - 治疗症状

Option 2 - Treat the cause (much better) 选项2 - 对待原因（更好）

解决方案2 1 2013-11-27 11:33:52

解决方案3 0 2013-08-09 02:45:47

解决方案1
3 2013-11-06 13:02:38

解决方案2
1 2013-11-27 11:33:52

解决方案3
0 2013-08-09 02:45:47