使用正则表达式解析具有重复部分的平面文件

Question

I have a flat file with data in following format: 我有一个包含以下格式数据的平面文件：

1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

Where each section starts with a time, followed by headers and then values. 每个部分以时间开头，然后是标题，然后是值。 I am trying to capture each of these sections through regex, so I can get: 我试图通过正则表达式捕获这些部分，所以我可以得到：

section 1:
1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

section 2:
12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

And later parse each of these sections in to java class object, which is given below: 然后将这些部分解析为java类对象，如下所示：

public class Section {
    String timestamp;
    List<Row> rows;
}

public class Row {
    String name;
    String uniqueId;
}

but I am not able to extract the "text" between two positive regex matches. 但我无法提取两个正面的正则表达式匹配之间的“文本”。 Below is the regular expression i tried: 下面是我试过的正则表达式：

((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)

But it returns only the time values: 但它只返回时间值：

10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM

I even tried adding Pattern.MULTILINE to Pattern but it didn't work either. 我甚至尝试将Pattern.MULTILINE添加到Pattern但它也没有用。

Answer 1

Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line. 假设您向我们展示的结构在整个文件中重复，那么顺序有四种类型的行：时间戳，标题，数据，空行。

For example, if you want to separate the unique ID from the name, you could try: 例如，如果要将唯一ID与名称分开，可以尝试：

String third = "ABX 298819 12       519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");

使用正则表达式解析具有重复部分的平面文件

问题描述

1 个解决方案

解决方案1
0 2016-11-28 12:05:50

使用正则表达式解析具有重复部分的平面文件

问题描述

1 个解决方案

解决方案1 0 2016-11-28 12:05:50

解决方案1
0 2016-11-28 12:05:50