简体   繁体   English

Java 7正则表达式和具有多种模式的命名组

[英]Java 7 Regex and named groups with multiple patterns

I have two different sources feeding input files to my application. 我有两个不同的来源将输入文件提供给我的应用程序。 Their filename patterns differ, yet they contain common information that I want to retrieve. 它们的文件名模式不同,但是它们包含我要检索的公共信息。

Using regex named groups seemed convenient, as it allows for maximum code factorization, however it has its limits, as I cannot concat the two patterns if they use the same group names. 使用正则表达式命名的组似乎很方便,因为它可以最大程度地实现代码分解,但是它有其局限性,因为如果两个模式使用相同的组名,则无法合并这两个模式。

Example: 例:

In other words, this: 换句话说,这是:

String PATTERN_GROUP_NAME   = "name";
String PATTERN_GROUP_DATE   = "date";
String PATTERN_IMPORT_1     = "(?<" + PATTERN_GROUP_NAME + ">[a-z]{3})_(?<" + PATTERN_GROUP_DATE + ">[0-9]{14})_(stuff stuf)\\.xml";
String PATTERN_IMPORT_2     = "(stuff stuf)_(?<" + PATTERN_GROUP_DATE + ">[0-9]{14})_(?<" + PATTERN_GROUP_NAME + ">[a-z]{3})_(other stuff stuf)\\.xml";
Pattern universalPattern    = Pattern.compile(PATTERN_IMPORT_1 + "|" + PATTERN_IMPORT_2);
try {
  DirectoryStream<Path> list = Files.newDirectoryStream(workDirectory);
  for (Path file : list) {
    Matcher matcher = universalPattern.matcher(file.getFileName().toString());
    name = matcher.group(PATTERN_GROUP_NAME);
    fileDate = dateFormatter.parseDateTime(matcher.group(PATTERN_GROUP_DATE));
    (...)

will fail with a java.util.regex.PatternSyntaxException because the named capturing groups are already defined. 将因java.util.regex.PatternSyntaxException而失败,因为已定义了命名捕获组。

What would be the most efficient / elegant way of solving this problem? 解决这个问题的最有效/最优雅的方法是什么?

Edits: 编辑:

It goes without saying, but the two patterns I can match my input files against are different enough so no input file can match both. 不用说,但是我可以匹配输入文件的两种模式足够不同,因此没有输入文件可以匹配两者。

Use two patterns - then group names can be equal. 使用两种模式-组名可以相等。

You asked for efficient and elegant. 您要求高效而优雅。 Theoretical one pattern could be more efficient, but that is irrelevant here. 理论上一种模式可能会更有效,但是在此无关紧要。

First: the code will be slightly longer, but better readable - a weakness of regex. 首先:代码会稍长一些,但可读性更好-regex的缺点。 That makes it better maintainable. 这样可以更好地维护。

In pseudo-code: 用伪代码:

Matcher m = firstPattern.matcher ...
if (!m.matches()) {
    m = secondPattern.matcher ...
    if (!m.matches()) {
        continue;
    }
}
name = m.group(NAME_GROUP);
...

(Everyone want to do too clever coding, but simplicity may be called for.) (每个人都想做太聪明的编码,但是可能要求简单。)

Agree with Joop Eggen's opinion. 同意乔普·艾根的观点。 Two patterns are simple & easily maintainable. 两种模式都很简单且易于维护。
Just for fun , and give you one pattern implementation for your specific case. 只是为了好玩 ,并为您的特定情况提供一种模式实现。 (a liitle bit longer & ugly.) (一点点又长又丑。)

String[] inputs = {
        "stuff stuf_20111130121212_abc_other stuff stuf.xml",
        "stuff stuf_20111130151212_def_other stuff stuf.xml",
        "abc_20141220202020_stuff stuf.xml", 
        "def_20140820202020_stuff stuf.xml"
        };    
    String lookAhead = "(?=([a-z]{3}_[0-9]{14}_stuff stuf\\.xml)|(stuff stuf_[0-9]{14}_[a-z]{3}_other stuff stuf\\.xml))";
    String onePattern = lookAhead
            + "((?<name>[a-z]{3})_(other stuff stuf)?|(stuff stuf_)?(?<date>[0-9]{14})_(stuff stuf)?){2}\\.xml";


Pattern universalPattern = Pattern.compile(onePattern);
for (String input : inputs) {
    Matcher matcher = universalPattern.matcher(input);
    if (matcher.find()) {
        //System.out.println(matcher.group());
        String name = matcher.group("name");
        String fileDate = matcher.group("date");
        System.out.println("name : " + name + " fileDate: "
                + fileDate);
    }
}

The output: 输出:

name : abc fileDate: 20111130121212
name : def fileDate: 20111130151212
name : abc fileDate: 20141220202020
name : def fileDate: 20140820202020

Actually, in your case, the "lookAhead" is not necessary. 实际上,在您的情况下,不需要“ lookAhead”。 Since in one pattern, you can't assign two goups with the same name. 因为在一种模式中,您不能分配两个具有相同名称的组。 Therefore, normally, you need to revise your pattern. 因此,通常,您需要修改模式。
From AB|BA ---> (A|B){2} 从AB | BA --->(A | B){2}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM