简体   繁体   中英

Create string templates from arbitrary regular expressions?

Regular expressions are used to parse already formatted strings but I would like to use them to take raw strings of characters and format them, examples:

// phone number
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
// should return "(123) 456-7890"
// date
format("\d{4}-\d{2}-\d{2}", "20180712");
// should return "2018-07-12"
// arbitrary
format("([A-Z]+-\d+ )+", "ABC123DEFGH45IJ6789");
// should return "ABC-123 DEFGH-45 IJ-6789 "

The above are just examples , I'd like a general solution that works for any arbitrary regex and any arbitrary string (that fits the regex).

Here's what I have so far, which is a little inelegant, and really limited in its abilities, but does satisfy the first 2 of the 3 examples above:

 function consumeCharacters(amount) { return (characterArray) => { return characterArray.splice(0, amount).join(''); }; } function parseSimpleRegex(regexString) { // filter out backslash escapes let parsed = regexString.replace(/\\\\./g, (...args) => { return args[0][args[0].length-1]; }); // get literal characters let literals = parsed.split(/d\\{\\d\\}/); // get variable symbols let variables = parsed.match(/d\\{\\d\\}/g); let varFunctions = variables.map(variable => consumeCharacters(variable[2])); let result = []; while (literals.length > 0) { result.push(literals.shift()); result.push(varFunctions.shift()); } while (varFunctions.length > 0) { result.push(varFunctions.shift()); } // filter out undefineds & empty strings result = result.filter(resultPart => !!resultPart); return result; } function format(regexString, rawString) { let rawCharacters = rawString.split(''); let formatter = null; try { formatter = parseSimpleRegex(regexString); } catch (e) { return 'failed parsing regex'; } let formattedString = formatter.map((format) => { if (typeof format === 'string') { return format; } if (typeof format === 'function') { return format(rawCharacters); } }).join(''); return formattedString; } const testCases = [ { args: ["\\\\(\\\\d{3}\\\\) \\\\d{3}-\\\\d{4}", "1234567890"], expected: "(123) 456-7890" }, { args: ["\\\\d{4}-\\\\d{2}-\\\\d{2}", "20180712"], expected: "2018-07-12" }, { args: ["([AZ]+-\\\\d+ )+", "ABC123DEFGH45IJ6789"], expected: "ABC-123 DEFGH-45 IJ-6789 " }, ]; testCases.forEach((testCase, index) => { const result = format(...testCase.args); const expected = testCase.expected; if (result === expected) { console.log(`Test Case #${index+1} passed`); } else { console.log(`Test Case #${index+1} failed, expected: "${expected}", result: "${result}"`); } }); 

Can the above solution be scaled for more complex regexes? Or is there a better alternative approach?

The general answer is: Use a regex that creates groups , then use replace with backreferences to format the output.

For example, using your first example, use this regex:

/(\d{3})(\d{3})(\d{4})/

It creates three Groups, the first 3 numbers, the next 3 numbers and the final 4 numbers.

Now the format, use string.replace function:with the following replacement pattern:

($1) $2-$3

I will add parentheses around the first Group, add a Space, then the second Group and finally a hyphen and the last Group.

How to use:

You can create your formatPhone function like this:

function formatPhone(rawPhone)
{
    return rawPhone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}

You can do similar with your other patterns.

Edit :

A totally general soultion requires that you pass, both the raw string, the regex pattern and the replacement pattern to your function, like this:

function format(rawString, regex, replacement)
{
   return rawString.replace(regex, replacement);
}

where regex and replacement must follow the rules described above.

Edit2 :

I think you have missunderstood something here. Let's take your first example:

format("\(\d{3}\) \d{3}-\d{4}", "1234567890");

Here the regex simply doesn't match !!! So in short, you can't make a function that takes a format regex. Regexes are made to match (and possibly replace ) as shown above.

You could use pattern (\\d{3})(\\d{3})(\\d{4}) and substitute it with (\\d{3})(\\d{3})(\\d{4}) , which yields 123-456-7890 .

For third example, use: (\\w{3})(\\w{3})(\\w{5})(\\w{2})(\\w{2})(\\w{4}) and replace it with \\1-\\2 \\3-\\4 \\5-\\6 , which returns ABC-123 DEFGH-45 IJ-6789 .

Generally use (\\w{n})...(\\w{m}) , where n and m are some integers for capturing p[arts of a string to parrticular groups (you could specify those intregers with an array). And you could also provide separators in an array as well to form your patterns.

Demo

UPDATE

As I said, general solution would be to supply sizes of blocks, that string should be split into and array of separators. See code below:

var str =  "ABC123DEFGH45IJ6789";
var blockSizes = [3,3,5,2,2,4];
var separators = ["-"," ","-"," ","-"];
var pattern = "(\\w{" + blockSizes[0] + "})";
var replacementPattern = "$1";
var i;
for(i = 1; i < blockSizes.length; i++)
{
    pattern += "(\\w{" + blockSizes[i] + "})";
    replacementPattern += separators[i - 1] + "$" + (i + 1);
}

Now, just use this patterns to replace and you're done:

JS fiddle

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM