简体   繁体   English

正则表达式解析器一步一步Java

[英]Regex parser step by step Java

This is my data and its pattern: 这是我的数据及其模式:

// _23.02_ANTALYA____________FRANKFURT___________DE_7461_18:20-21:00________________
public static final String FLIGHT_DEFAULT_PATTERN = "\\s+\\d{2}.\\d{2}\\s[A-Z]+\\s+[A-Z]+\\s+[A-Z\\s]{3}[\\d\\s]{5}\\d{2}:\\d{2}-\\d{2}:\\d{2}\\s+";

Underscores are space character. 下划线是空间特征。 Now I need a class that divides every regex term to data. 现在我需要一个将每个正则表达式术语划分为数据的类。 For example 例如

\\s+ = " "
\\d{2} = "23"
. = "."
\\d{2} = "02"
\\s = " "
[A-Z]+ = "ANTALYA"

etc... That must be ordered by pattern. 等等......必须按模式排序。

How can I do this or is there a library for this? 我怎么能这样做或者有一个图书馆吗?

As @devnull mentioned, you should use capturing groups : 正如@devnull所提到的,你应该使用捕获组

(\s+)(\d{2})(.)(\d{2})(\s)([A-Z]+)(\s+)([A-Z]+)(\s+)([A-Z\s]{3})([\d\s]{5})(\d{2}:\d{2})(-)(\d{2}:\d{2})(\s+)

See the full explanation of this regular expression on Regex101 . 请参阅Regex101上此正则表达式的完整说明。

You would then use something like the following to match the text and extract the individual values: 然后,您将使用以下内容匹配文本并提取单个值:

String text = " 23.02 ANTALYA            FRANKFURT            DE 7461 18:20-21:00                 ";
Pattern pattern = Pattern.compile("(\\s+)(\\d{2})(.)(\\d{2})(\\s)([A-Z]+)(\\s+)([A-Z]+)(\\s+)([A-Z\\s]{3})([\\d\\s]{5})(\\d{2}:\\d{2})(-)(\\d{2}:\\d{2})(\\s+)");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
    for (int i = 1; i < matcher.groupCount(); i++) {
        System.out.println(matcher.group(i));
    }
}

To make it easier to extract specific fields, you could (in Java 7 and later) use named capturing groups: 为了更容易提取特定字段,您可以(在Java 7及更高版本中)使用命名捕获组:

(?<LeadSpace>\s+)(?<Day>\d{2})(.)(?<Month>\d{2})...

You could then use something like the following to get each named group: 然后,您可以使用以下内容来获取每个命名组:

...
if (matcher.find()) {
    System.out.println(matcher.group("LeadSpace"));
    System.out.println(matcher.group("Day"));
    System.out.println(matcher.group("Month"));
    ...
}

I found a different way. 我发现了一种不同的方式。 I divided pieces with my hand. 我用手分开了碎片。

// _24.02_MAURITIUS_________HAMBURG________________via:FRA_DE/LH____08:30-20:05_____
public static final List<String> FLIGHT_VIA_PATTERN = Arrays.asList( "\\s+", "\\d{2}", "\\.", "\\d{2}", "\\s+", "[A-Z]+", "\\s+", "[A-Z]+", "\\s+", "via:", "[A-Z\\s]{4}", "[A-Z]{2,3}", "/",
        "[A-Z]{2,3}", "\\s+", "\\d{2}", ":", "\\d{2}", "\\-", "\\d{2}", ":", "\\d{2}", "\\s+" );

After this I used a loop and everything is fine. 在此之后我使用了一个循环,一切都很好。 This question can close. 这个问题可以关闭。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM