简体   繁体   中英

Parsing flat file with repeating section using regex

I have a flat file with data in following format:

1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

Where each section starts with a time, followed by headers and then values. I am trying to capture each of these sections through regex, so I can get:

section 1:
1:00 PM
Name                UniqueID 
ABX 298819 12       519440AD3

section 2:
12:00 AM
Name                UniqueID 
AX1 239949 01       119440AD3

And later parse each of these sections in to java class object, which is given below:

public class Section {
    String timestamp;
    List<Row> rows;
}

public class Row {
    String name;
    String uniqueId;
}

but I am not able to extract the "text" between two positive regex matches. Below is the regular expression i tried:

((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)

But it returns only the time values:

10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM

I even tried adding Pattern.MULTILINE to Pattern but it didn't work either.

Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line.

For example, if you want to separate the unique ID from the name, you could try:

String third = "ABX 298819 12       519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM