简体   繁体   English

需要正则表达式来解析多行环境变量

[英]Need Regular Expression to parse multi-line environmental variables

I want to parse a file that is a list of environmental variables similar to this example: 我想解析一个文件,该文件是类似于此示例的环境变量列表:

TPS_LIB_DIR = "$DEF_VERSION_DIR\lib\ver215";

TPS_PH_DIR = "$DEF_VERSION_DIR";

TPS_SCHEMA_DIR = "~TPS_DIR\Supersedes\code;" +
                "~TPR_DIR\..\Supersedes\code;" +
                "~TPN_DIR\..\..\Supersedes\code;" +
                "$TPS_VERSION_DIR";

TPS_LIB_DIR = "C:\prog\lib";

BASE_DIR     = "C:\prog\base";

SPARS_DIR    = "C:\prog\spars";

SIGNALFILE_DIR = "E:\SIGNAL_FILES";
SIGNALFILE2_DIR = "E:\SIGNAL_FILES2";
SIGNALFILE3_DIR = "E:\SIGNAL_FILES2";

I came up with this regular expression that matches the single line definitions fine, but it will not match the multi-line definitions. 我想出了一个与单行定义匹配的正则表达式,但是与多行定义不匹配。

(\w+)\s*=\s*(.*);[\r\n]+

Does anyone know of a regular expression which will parse all lines in this file where the environmental variable name is in group 1 and the value (on right side of =) is in group 2? 有谁知道一个正则表达式将解析此文件中所有环境变量名称在组1中且值(在=右侧)在组2中的所有行? Even better would be if the multiple paths were in separate groups, but I can handle that part manually. 如果多个路径位于不同的组中,那就更好了,但是我可以手动处理该部分。

UPDATE: 更新:

Here is what I ended up implementing. 这就是我最终实现的。 The first pattern "Pattern p" matches the individual environmental variable blocks. 第一个模式“模式p”与各个环境变量块匹配。 The second pattern, "Pattern valpattern" parses the one or more values for each environmental variable. 第二种模式“模式valpattern”为每个环境变量解析一个或多个值。 Hope someone finds this useful. 希望有人觉得这有用。

private static void parse(File filename) {
    Pattern p = Pattern.compile("(\\w+)\\s*=\\s*([\\s\\S]+?\";)");
    Pattern valpattern = Pattern.compile("\\s*\"(.+)\"\\s*");
    try {
        String str = readFile(filename, StandardCharsets.UTF_8);
        Matcher matcher = p.matcher(str);
        while(matcher.find()) {
            String key = matcher.group(1);
            Matcher valmatcher = valpattern.matcher(matcher.group(2));
            System.out.println(key);
            while(valmatcher.find()) {                  
                System.out.println("\t" + valmatcher.group(1).replaceAll(System.getProperty("line.separator"), ""));
            }
        }
    } catch (IOException e) {
        System.out.println("Error: ProcessENV.parse -- problem parsing file: " + filename + System.lineSeparator());
        e.printStackTrace();
    }
}

static String readFile(File file, Charset encoding) throws IOException {
    byte[] encoded = Files.readAllBytes(file.toPath());
    return new String(encoded, encoding);
}

It is simpler to split on '=' and '";'. 分割'='和'“;'更为简单。

[ c.strip().split(' = ') for c in s.split('";') ] 

Or with double comprehension to get the individual paths: 或通过双重理解获得单独的路径:

[ [p[0].strip(), * [x.strip() for x in p.strip().split('=')] for c in s.split('";') for p in c.split(" = ")] 

Split could be done with re, adding \\s* to remove the trailing spaces: 可以使用re进行拆分,添加\\ s *以删除尾随空格:

 re.split(r'\s*=\s*|";\s*', text, flags=re.MULTILINE):

even elements r[::2] would be vars, odd [1::2] values then get rid of extra white space in values 偶数元素r [:: 2]将是vars,奇数[1 :: 2]值,然后除去值中多余的空格

You can use the following regex: 您可以使用以下正则表达式:

(\w+)\s*=\s*([\s\S]+?)";

It will start by matching a Group 1 of Word character, zero or more White Spaces , an equal sign , zero or more White Space , then a Group 2 or more of any characters ( non greedy ), and finally aa last double quote and a semi colon . 它将通过匹配组1的启动Word字符,零个或多个White Spaces ,一个equal sign ,零个或更多的White Space ,然后第2组以上的any字符( non greedy ),最后AA最后的双quotesemi colon

That will match all the lines. 那将匹配所有行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM