[英]Java java.util.regex.MatchResult counter problems with Scanner
I'm using a java.util.Scanner to scan all occurrences of a given regex from a big string. 我正在使用java.util.Scanner从大字符串扫描所有出现的给定正则表达式。
Scanner sc = new Scanner(body);
sc.useDelimiter("");
String match = "";
while(match!=null)
{
match = sc.findWithinHorizon(pattern, 0);
if(match==null)break;
MatchResult mr = sc.match();
System.out.println("Match string: "+mr.group());
System.out.println("Match string using indexes: "+body.substring(mr.start(),mr.end());
}
The strange thing is that after a certain number of scans, group() method returns the correct occurrence while the start() and end() methods return wrong indexes like the scan has restarted from the beginning of the file. 奇怪的是,经过一定数量的扫描后,group()方法返回正确的结果,而start()和end()方法返回错误的索引,例如扫描从文件开头重新开始。 The regex is multiline (i use this regex to discover a line change "\\r\\n|[\\n\\r\
\
\
]").
正则表达式是多行的(我使用此正则表达式来发现行更改“ \\ r \\ n | [\\ n \\ r \\ u2028 \\ u2029 \\ u0085]”)。
Do you have any hint? 你有什么提示吗? Could it be related to the "horizon" parameter (I've tried differend combinations for that value)?
可能与“水平”参数有关(我已经尝试过使用该值的差分组合)吗?
For more details, it seems related to the dimension of the file (more than 1000 chars), after about 1000 the counter restart from 0 (eg the first wrong index occurrence after 1003:1020 becomes 3:120). 有关更多详细信息,它似乎与文件的大小(超过1000个字符)有关,在大约1000后,计数器从0重新开始(例如,在1003:1020之后出现的第一个错误索引变为3:120)。
Scanner
uses an internal buffer with 1024
characters. Scanner
使用带有1024
字符的内部缓冲区。 Use Pattern
instead: 改用
Pattern
:
Matcher matcher = Pattern.compile(...).matcher(body);
while(matcher.find()) {
int start = matcher.start();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.