简体   繁体   English

Java Regex编号多行列表

[英]Java Regex Numbered multiline list

I want to parse a document and extract each element of a numbered list for example I have this: 我想解析文档并提取编号列表的每个元素,例如,我有这个:

 1. I like to blah
    and blah
 2. But also to blah 
    and blah

I would like to extract each element from the list like [1. 我想从列表中提取每个元素,例如[1。 text for item1, 2. text from item2] I used a regular expression like this "[0-9].*;" 项目1、2的文本。来自项目2的文本]我使用了正则表达式,例如“ [0-9]。*;”。 before because I though each list item ended with ; 以前,因为我虽然每个列表项都以;结尾 but that is not always true. 但这并不总是正确的。 So I would like a regex to extract the text without it ending with ";". 因此,我希望正则表达式提取文本时不以“;”结尾。 This is what I tried: 这是我尝试的:

String regexLineNumber = "[0-9]..*;";
String[] splitted = inputData.split(regexLineNumber);

I would try to avoid regular expressions when you can in general. 我会尽量避免使用正则表达式。 They are terribly memory inefficient and in most circumstances are just used as a shortcut. 它们的内存效率极低,在大多数情况下只是用作快捷方式。 In this situation, you could easily create a BufferedReader and read each line looking for a certain value. 在这种情况下,您可以轻松地创建BufferedReader并读取每行以寻找某个值。 something like 就像是

BufferedReader reader = new BufferedReader(/*instantiate here*/);
int nextNum = 2;
StringBuilder curRecord = new StringBuilder();
String line;
Collection<String> elements = new ArrayList<String>(/*Expected number ??*/);
while ((line = reader.readLine()) != null) {
   if (line.trim().startsWith(nextNum + ".")) {
       elements.add(curRecord.toString());
       curRecord = new StringBuilder();
       nextNum++;
   }
   curRecord.append(line);
}
if (!curRecord.trim().equals("")) {
   elements.add(curRecord.toString());
}

I suggest you use a regex that would allow numbers in the middle of or at the end of a sentence. 我建议您使用一个正则表达式,允许在句子的中间或结尾加上数字。

(?<=[\n\r\s]*|^)(\d\.[^\d]*)

Also remember to use matches and not findall . 还记得使用matches ,而不是findall

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM