简体   繁体   English

使用正则表达式解析具有重复部分的平面文件

[英]Parsing flat file with repeating section using regex

I have a flat file with data in following format: 我有一个包含以下格式数据的平面文件:

1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

Where each section starts with a time, followed by headers and then values. 每个部分以时间开头,然后是标题,然后是值。 I am trying to capture each of these sections through regex, so I can get: 我试图通过正则表达式捕获这些部分,所以我可以得到:

section 1:
1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

section 2:
12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

And later parse each of these sections in to java class object, which is given below: 然后将这些部分解析为java类对象,如下所示:

public class Section {
    String timestamp;
    List<Row> rows;
}

public class Row {
    String name;
    String uniqueId;
}

but I am not able to extract the "text" between two positive regex matches. 但我无法提取两个正面的正则表达式匹配之间的“文本”。 Below is the regular expression i tried: 下面是我试过的正则表达式:

((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)

But it returns only the time values: 但它只返回时间值:

10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM

I even tried adding Pattern.MULTILINE to Pattern but it didn't work either. 我甚至尝试将Pattern.MULTILINE添加到Pattern但它也没有用。

Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line. 假设您向我们展示的结构在整个文件中重复,那么顺序有四种类型的行:时间戳,标题,数据,空行。

For example, if you want to separate the unique ID from the name, you could try: 例如,如果要将唯一ID与名称分开,可以尝试:

String third = "ABX 298819 12       519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM