Regular expressions are used to parse already formatted strings but I would like to use them to take raw strings of characters and format them, examples:
// phone number
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
// should return "(123) 456-7890"
// date
format("\d{4}-\d{2}-\d{2}", "20180712");
// should return "2018-07-12"
// arbitrary
format("([A-Z]+-\d+ )+", "ABC123DEFGH45IJ6789");
// should return "ABC-123 DEFGH-45 IJ-6789 "
The above are just examples , I'd like a general solution that works for any arbitrary regex and any arbitrary string (that fits the regex).
Here's what I have so far, which is a little inelegant, and really limited in its abilities, but does satisfy the first 2 of the 3 examples above:
function consumeCharacters(amount) { return (characterArray) => { return characterArray.splice(0, amount).join(''); }; } function parseSimpleRegex(regexString) { // filter out backslash escapes let parsed = regexString.replace(/\\\\./g, (...args) => { return args[0][args[0].length-1]; }); // get literal characters let literals = parsed.split(/d\\{\\d\\}/); // get variable symbols let variables = parsed.match(/d\\{\\d\\}/g); let varFunctions = variables.map(variable => consumeCharacters(variable[2])); let result = []; while (literals.length > 0) { result.push(literals.shift()); result.push(varFunctions.shift()); } while (varFunctions.length > 0) { result.push(varFunctions.shift()); } // filter out undefineds & empty strings result = result.filter(resultPart => !!resultPart); return result; } function format(regexString, rawString) { let rawCharacters = rawString.split(''); let formatter = null; try { formatter = parseSimpleRegex(regexString); } catch (e) { return 'failed parsing regex'; } let formattedString = formatter.map((format) => { if (typeof format === 'string') { return format; } if (typeof format === 'function') { return format(rawCharacters); } }).join(''); return formattedString; } const testCases = [ { args: ["\\\\(\\\\d{3}\\\\) \\\\d{3}-\\\\d{4}", "1234567890"], expected: "(123) 456-7890" }, { args: ["\\\\d{4}-\\\\d{2}-\\\\d{2}", "20180712"], expected: "2018-07-12" }, { args: ["([AZ]+-\\\\d+ )+", "ABC123DEFGH45IJ6789"], expected: "ABC-123 DEFGH-45 IJ-6789 " }, ]; testCases.forEach((testCase, index) => { const result = format(...testCase.args); const expected = testCase.expected; if (result === expected) { console.log(`Test Case #${index+1} passed`); } else { console.log(`Test Case #${index+1} failed, expected: "${expected}", result: "${result}"`); } });
Can the above solution be scaled for more complex regexes? Or is there a better alternative approach?
The general answer is: Use a regex that creates groups
, then use replace
with backreferences to format the output.
For example, using your first example, use this regex:
/(\d{3})(\d{3})(\d{4})/
It creates three Groups, the first 3 numbers, the next 3 numbers and the final 4 numbers.
Now the format, use string.replace
function:with the following replacement pattern:
($1) $2-$3
I will add parentheses around the first Group, add a Space, then the second Group and finally a hyphen and the last Group.
How to use:
You can create your formatPhone function like this:
function formatPhone(rawPhone)
{
return rawPhone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}
You can do similar with your other patterns.
Edit :
A totally general soultion requires that you pass, both the raw string, the regex pattern and the replacement pattern to your function, like this:
function format(rawString, regex, replacement)
{
return rawString.replace(regex, replacement);
}
where regex and replacement must follow the rules described above.
Edit2 :
I think you have missunderstood something here. Let's take your first example:
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
Here the regex simply doesn't match !!! So in short, you can't make a function that takes a format regex. Regexes are made to match
(and possibly replace
) as shown above.
You could use pattern (\\d{3})(\\d{3})(\\d{4})
and substitute it with (\\d{3})(\\d{3})(\\d{4})
, which yields 123-456-7890
.
For third example, use: (\\w{3})(\\w{3})(\\w{5})(\\w{2})(\\w{2})(\\w{4})
and replace it with \\1-\\2 \\3-\\4 \\5-\\6
, which returns ABC-123 DEFGH-45 IJ-6789
.
Generally use (\\w{n})...(\\w{m})
, where n
and m
are some integers for capturing p[arts of a string to parrticular groups (you could specify those intregers with an array). And you could also provide separators in an array as well to form your patterns.
UPDATE
As I said, general solution would be to supply sizes of blocks, that string should be split into and array of separators. See code below:
var str = "ABC123DEFGH45IJ6789";
var blockSizes = [3,3,5,2,2,4];
var separators = ["-"," ","-"," ","-"];
var pattern = "(\\w{" + blockSizes[0] + "})";
var replacementPattern = "$1";
var i;
for(i = 1; i < blockSizes.length; i++)
{
pattern += "(\\w{" + blockSizes[i] + "})";
replacementPattern += separators[i - 1] + "$" + (i + 1);
}
Now, just use this patterns to replace and you're done:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.