Parsing and rewriting XHTML in one step?

Question

I need to take this input:

<Person>
  <name>
    <first>John</first>
    <last>Galt</last>
  </name>
</Person>

And regex my way to this output:

<div>&lt;Person&gt;
  <div>&lt;name&gt;
    <div>&lt;firstt&gt;John&lt;/first&gt;</div>
    <div>&lt;lastt&gt;Galt&lt;/last&gt;</div>
  &lt;/name&gt;</div>
&lt;/Person&gt;</div>

I have a solution that *works:

var output = input.replace(/([<])\/([a-zA-Z][A-Z0-9]*)([^>]*)([>])/g, "&lt;$2$3&gt </div>");
    output = output.replace(/([<])([a-zA-Z][A-Z0-9]*)([^>]*)([>])/g, "<div>&lt;$2$3&gt;");

But I feel like its a little inefficient and was wondering if a regex savant could help me clean it up a little - ideally into one step? My problem was that my regex couldn't handle nested elements (when I tried to do it all in one step). Thanks!

**EDIT: Good catch racraman

Answer 1

To inject <div> and </div> You could've used empty-group matching:

input.replace(/(<(\/)[^>\/]*>)|(<[^>\/]*>)/g,"$1<$2div>$3");

This would've produced:

<div><Person>
  <div><name>
    <div><first>John</first></div>
    <div><last>Galt</last></div>
  </name></div>
</Person></div>

But You're also asking to replace < and > with < and > respectively - known regexp engines don't support such group-content transformations within same step. Eg You're limited to use either portions of matched groups or quite primitive (uppercase/lowercase) transformation of those .

So I would've either simplified Yours:

var output = input.replace(/<\/([^>]*)>)/g, "&lt;$1&gt;</div>");
    output = output.replace(/<([^>\/]*)>/g, "<div>&lt;$1&gt;");

or would've used the empty-groups approach:

var ouptut = input.
replace(/<((\/)([^>\/]*)|([^>\/]*))>/g,"&lt;$2$3&gt;<$2div>&lt;$4&gt;").
replace(/&lt;&gt;/g,"");

Parsing and rewriting XHTML in one step?

Question

1 answers

solution1
1 2014-01-18 18:43:48

Parsing and rewriting XHTML in one step?

Question

1 answers

solution1 1 2014-01-18 18:43:48

solution1
1 2014-01-18 18:43:48