简体   繁体   English

在Java中重复正则表达式模式

[英]repeating regex pattern in java

I want to read a text file that contains the following, which I'm trying to use regex to match and split the contents into different strings for 2 priority queues for making a heap-based priority queue task scheduler. 我想读取一个包含以下内容的文本文件,我正在尝试使用正则表达式来匹配内容并将内容拆分为2个优先级队列的不同字符串,以制作基于堆的优先级队列任务调度程序。 But firstly, I need to make sure that the format is right in the text file which I read using a Scanner, where it starts with a task containing alphanumeric letters, followed by a non-negative integer (the arrival time) and a natural number (the deadline time). 但是首先,我需要确保格式正确无误,该格式是我使用扫描仪读取的文本文件的格式,该格式以包含字母数字字母的任务开头,后跟一个非负整数(到达时间)和一个自然数(截止时间)。 The following is the input within the text file with the right format: 以下是文本文件中格式正确的输入:

task1 2 3 task2 2 3 task3 2 3 task4 4 5 task5 4 5
task6 7 9 task7 7 9 task8 7 9 task9 7 9
task10 7 9 task11 7 9 task12 7 9 task13 7 9
task14 7 9 task15 7 9 task16 10 11 task17 10 11
task18 10 11 task19 10 11  task20 10 12

I tried the following regex code to try and check whether the format is right, but I can only match it up to the first task attributes. 我尝试了以下正则表达式代码来尝试检查格式是否正确,但是我只能将其与第一个任务属性进行匹配。 I can't seem to match it beyond the first task, meaning when it goes on to the other tasks where the format repeats, then the regex will fail. 我似乎无法在第一个任务之外匹配它,这意味着当它继续到其他重复格式的任务时,则正则表达式将失败。 Any idea what is wrong with my regex? 知道我的正则表达式有什么问题吗?

(^\s*[a-zA-Z0-9]*\s+\d+\s+\d+\s*){1,}

^ starts off with any space \\s* 0 or more times ^以任意空格\\s* 0次或多次

[a-zA-Z0-0]* is the alphanumeric 0 or more times, referring to the tasks [a-zA-Z0-0]*是0或更多次的字母数字,表示任务

\\s+ is the white spaces between the different task attributes \\s+是不同任务属性之间的空白

\\d+ is the arrival and deadline times \\d+是到达时间和截止时间

\\s* ends with white spaces 0 or more times between different tasks \\s*在不同任务之间以空格结束0次或更多次

{1,} after the () brackets specify minimum number of repeat is 1, with no specified number for maximum repeats 方括号()后面的{1,}指定最小重复次数为1,最大重复次数没有指定次数

The problem is ^ which requires the match to be at the start of the input sequence and any but the first match won't satisfy that condition. 问题是^ ,它要求匹配项位于输入序列的开头,并且除第一个匹配项外的任何匹配项都不满足该条件。

Try to move the first part out of the group: 尝试将第一部分移出组:

^\s*([a-zA-Z0-9]*\s+\d+\s+\d+\s*){1,}

Btw, {1,} can be replaced with a single + . 顺便说一句, {1,}可以用单个+代替。

Also note that depending on how you apply the regex you'd either not need to wrap the expression with ^ and $ (eg String.matches() or Matcher.matches which do it implicitly) or you might have to do it (depending on your needs), eg add a $ at the end to require that nothing is allowed after the match (if that would violate your file format). 还要注意,根据您应用正则表达式的方式,您可能不需要用^$来包装表达式(例如,隐式执行^String.matches()Matcher.matches ),或者您可能必须这样做(取决于您的需求),例如,在末尾添加$ ,以要求匹配后不包含任何内容(如果这样做会违反您的文件格式)。

If you want to extract the matches as well, you'd need a slightly different approach, ie use Matcher.find() and remove the last part ( {1,} ). 如果还想提取匹配项,则需要稍微不同的方法,即使用Matcher.find()并删除最后一部分( {1,} )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM