[英]Stack Overflow in java regex
I am new in java. 我是java的新手。 I am getting java Stack overflow Exception in regex strHindiText.
我在regex strHindiText中得到java Stack overflow Exception。 What should I do for that?
我该怎么办?
try {
// This regex convert the pattern "{\fldrslt {\fcs1 \ab\af24 \fcs0 ऩ}{"
// into "{\fldrslt {\fcs1 \ab\af24 \fcs0 ऩ}}}{"
// strHindiText = strHindiText.replaceAll("\\{(\\\\fldrslt[ ])\\{((\\\\\\S+[ ])+)((\\s*&#\\d+;\\s*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*)+)\\}\\{","{$1{$2$4}}}{");
// This regex convert the pattern "{\fcs0 \af0 ऩ{ or {\fcs0 \af0 *\tab ऩ{"
// into "{\fcs0 \af0 ऩ }{"
strHindiText = strHindiText.replaceAll("\\{\\s*((\\\\\\S+[ ](\\*)?)+\\s*)(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)\\{", "{$1 $4$5 }{");
// This regex convert the pattern "{ऩ \fcs0 \af0 {"
// into "{ऩ \fcs0 \af0 }{"
strHindiText = strHindiText.replaceAll("\\{\\s*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)[ ]*((\\\\\\S+[ ])+)\\{", "{$1 $5 }{");
} catch(StackOverflowError er) {
System.out.println("Third try Block StackOverflowError in regex pattern to reform the rtf tags................");
er.printStackTrace();
// throw er;
}
Whenever these strHindiText contain large data it gives an java stackoverflow exception: 每当这些strHindiText包含大数据时,它就会产生一个java stackoverflow异常:
java.lang.StackOverflowError
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$Curly.match0(Pattern.java:3754)
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9) at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
My strHindiText data is: 我的strHindiText数据是:
`{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584 भोपाल  । \par }\pard\plain \ltrpar\s16\ql \li0\ri0\sb100\sa100\sbauto1\saauto1\sl240\slmult0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0\pararsid13505584 \cbpat20 \rtlch\fcs1 \af0\afs24\alang1025 \ltrch\fcs0 \fs24\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\rtlch\fcs1 \ab\af1\afs18 \ltrch\fcs0 \cs21\b\f1\fs18\cf21\insrsid13505584 अन्वेषण करें  :}{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584 \par भोपाल , मध्य प्रदेश की राजधानी प्राकृतिक सुंद`
Look for recursive calls in your regex. 在你的正则表达式中寻找递归调用。
If you are not sure where your problem lies: try a regex tester like this . 如果你不确定问题出在哪里:试试这样的正则表达式测试器 。
Don't use a regex if there are better tools for your task. 如果有更好的工具可用于您的任务,请不要使用正则表达式 。
In your case you could: Search for a RTF parsing library or write your own parser. 在您的情况下,您可以:搜索RTF解析库或编写自己的解析器。
eg like the one here that jahroy pointed out in the comments. 例如,像一个在这里说jahroy在评论中指出。
This is not a full answer but just for your information. 这不是一个完整的答案,只是为了您的信息。
In your regex: 在你的正则表达式:
(-|,|/|\\\\(|\\\\)|\\"|;|\\\\.|'|<|>|:|\\\\?)*
can be written as [-,/()\\";.'<>:?]*
(-|,|/|\\\\(|\\\\)|\\"|;|\\\\.|'|<|>|:|\\\\?)*
可以写成[-,/()\\";.'<>:?]*
Since this pattern occurs twice (in your first regex), this immediately shortens your regex by 40 characters and makes those sections much more readable. 由于此模式出现两次(在您的第一个正则表达式中),这会立即将正则表达式缩短40个字符,并使这些部分更具可读性。
Try this to catch the error 试试这个来捕捉错误
public class Example {
public static void endless() {
endless();
}
public static void main(String args[]) {
try {
endless();
} catch(StackOverflowError t) {
// more general: catch(Error t)
// anything: catch(Throwable t)
System.out.println("Caught "+t);
t.printStackTrace();
}
System.out.println("After the error...");
}
}
More importantly try increasing the size of the stack add this to your regex 更重要的是尝试增加堆栈的大小,将其添加到正则表达式中
+'xss='xss
adding the "+" symbol changes the operator to prevent back tracking since this doesnt seem to be necessary in your case. 添加“+”符号会更改操作符以防止回溯,因为在您的情况下这似乎不是必需的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.