简体   繁体   中英

Parsing and rewriting XHTML in one step?

I need to take this input:

<Person>
  <name>
    <first>John</first>
    <last>Galt</last>
  </name>
</Person>

And regex my way to this output:

<div>&lt;Person&gt;
  <div>&lt;name&gt;
    <div>&lt;firstt&gt;John&lt;/first&gt;</div>
    <div>&lt;lastt&gt;Galt&lt;/last&gt;</div>
  &lt;/name&gt;</div>
&lt;/Person&gt;</div>

I have a solution that *works:

var output = input.replace(/([<])\/([a-zA-Z][A-Z0-9]*)([^>]*)([>])/g, "&lt;$2$3&gt </div>");
    output = output.replace(/([<])([a-zA-Z][A-Z0-9]*)([^>]*)([>])/g, "<div>&lt;$2$3&gt;");

But I feel like its a little inefficient and was wondering if a regex savant could help me clean it up a little - ideally into one step? My problem was that my regex couldn't handle nested elements (when I tried to do it all in one step). Thanks!

**EDIT: Good catch racraman

To inject <div> and </div> You could've used empty-group matching:

input.replace(/(<(\/)[^>\/]*>)|(<[^>\/]*>)/g,"$1<$2div>$3");

This would've produced:

<div><Person>
  <div><name>
    <div><first>John</first></div>
    <div><last>Galt</last></div>
  </name></div>
</Person></div>

But You're also asking to replace < and > with &lt; and &gt; respectively - known regexp engines don't support such group-content transformations within same step. Eg You're limited to use either portions of matched groups or quite primitive (uppercase/lowercase) transformation of those .

So I would've either simplified Yours:

var output = input.replace(/<\/([^>]*)>)/g, "&lt;$1&gt;</div>");
    output = output.replace(/<([^>\/]*)>/g, "<div>&lt;$1&gt;");

or would've used the empty-groups approach:

var ouptut = input.
replace(/<((\/)([^>\/]*)|([^>\/]*))>/g,"&lt;$2$3&gt;<$2div>&lt;$4&gt;").
replace(/&lt;&gt;/g,"");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM