简体   繁体   中英

Need Regular Expression to parse multi-line environmental variables

I want to parse a file that is a list of environmental variables similar to this example:

TPS_LIB_DIR = "$DEF_VERSION_DIR\lib\ver215";

TPS_PH_DIR = "$DEF_VERSION_DIR";

TPS_SCHEMA_DIR = "~TPS_DIR\Supersedes\code;" +
                "~TPR_DIR\..\Supersedes\code;" +
                "~TPN_DIR\..\..\Supersedes\code;" +
                "$TPS_VERSION_DIR";

TPS_LIB_DIR = "C:\prog\lib";

BASE_DIR     = "C:\prog\base";

SPARS_DIR    = "C:\prog\spars";

SIGNALFILE_DIR = "E:\SIGNAL_FILES";
SIGNALFILE2_DIR = "E:\SIGNAL_FILES2";
SIGNALFILE3_DIR = "E:\SIGNAL_FILES2";

I came up with this regular expression that matches the single line definitions fine, but it will not match the multi-line definitions.

(\w+)\s*=\s*(.*);[\r\n]+

Does anyone know of a regular expression which will parse all lines in this file where the environmental variable name is in group 1 and the value (on right side of =) is in group 2? Even better would be if the multiple paths were in separate groups, but I can handle that part manually.

UPDATE:

Here is what I ended up implementing. The first pattern "Pattern p" matches the individual environmental variable blocks. The second pattern, "Pattern valpattern" parses the one or more values for each environmental variable. Hope someone finds this useful.

private static void parse(File filename) {
    Pattern p = Pattern.compile("(\\w+)\\s*=\\s*([\\s\\S]+?\";)");
    Pattern valpattern = Pattern.compile("\\s*\"(.+)\"\\s*");
    try {
        String str = readFile(filename, StandardCharsets.UTF_8);
        Matcher matcher = p.matcher(str);
        while(matcher.find()) {
            String key = matcher.group(1);
            Matcher valmatcher = valpattern.matcher(matcher.group(2));
            System.out.println(key);
            while(valmatcher.find()) {                  
                System.out.println("\t" + valmatcher.group(1).replaceAll(System.getProperty("line.separator"), ""));
            }
        }
    } catch (IOException e) {
        System.out.println("Error: ProcessENV.parse -- problem parsing file: " + filename + System.lineSeparator());
        e.printStackTrace();
    }
}

static String readFile(File file, Charset encoding) throws IOException {
    byte[] encoded = Files.readAllBytes(file.toPath());
    return new String(encoded, encoding);
}

It is simpler to split on '=' and '";'.

[ c.strip().split(' = ') for c in s.split('";') ] 

Or with double comprehension to get the individual paths:

[ [p[0].strip(), * [x.strip() for x in p.strip().split('=')] for c in s.split('";') for p in c.split(" = ")] 

Split could be done with re, adding \\s* to remove the trailing spaces:

 re.split(r'\s*=\s*|";\s*', text, flags=re.MULTILINE):

even elements r[::2] would be vars, odd [1::2] values then get rid of extra white space in values

You can use the following regex:

(\w+)\s*=\s*([\s\S]+?)";

It will start by matching a Group 1 of Word character, zero or more White Spaces , an equal sign , zero or more White Space , then a Group 2 or more of any characters ( non greedy ), and finally aa last double quote and a semi colon .

That will match all the lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM