[英]Parsing flat file with repeating section using regex
I have a flat file with data in following format: 我有一个包含以下格式数据的平面文件:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
Where each section starts with a time, followed by headers and then values. 每个部分以时间开头,然后是标题,然后是值。 I am trying to capture each of these sections through regex, so I can get:
我试图通过正则表达式捕获这些部分,所以我可以得到:
section 1:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
section 2:
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
And later parse each of these sections in to java class object, which is given below: 然后将这些部分解析为java类对象,如下所示:
public class Section {
String timestamp;
List<Row> rows;
}
public class Row {
String name;
String uniqueId;
}
but I am not able to extract the "text" between two positive regex matches. 但我无法提取两个正面的正则表达式匹配之间的“文本”。 Below is the regular expression i tried:
下面是我试过的正则表达式:
((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)
But it returns only the time values: 但它只返回时间值:
10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM
I even tried adding Pattern.MULTILINE
to Pattern
but it didn't work either. 我甚至尝试将
Pattern.MULTILINE
添加到Pattern
但它也没有用。
Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line. 假设您向我们展示的结构在整个文件中重复,那么顺序有四种类型的行:时间戳,标题,数据,空行。
For example, if you want to separate the unique ID from the name, you could try: 例如,如果要将唯一ID与名称分开,可以尝试:
String third = "ABX 298819 12 519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.