简体   繁体   中英

Regex how to replace all instances except

I have several hundred XML files which i need to make a slight change to. I'm aware that i really should be using XSLT to make batch changes to XML structure, but i think some quick and dirty Regex will do what i need much faster than me working out the XSLT. At least i thought that before spending hours trying to get the Regex right!!

Take the below example, what i have is various lists <seqlist> which contain <items> elements for each item in the list. Each <item> element contains a <para> element which has various ID attribute values. I want to remove those <para> tags completely so that the <item> contains the actual text.

So from: <seqlist><item><para id="1.1">Some text here.</para></item></seqlist> To: <seqlist><item>Some text here.</item></seqlist>

This is fairly strightforward in itself i can simply do:

Regex: <item><para id="([^\\"]*)"> Replace: <item>

Then remove the redundant closing tags by doing a simple find replace

Find: </para></item> Replace: </item> .

However, as can be seen from the example below, some <item> elements in the list, contain another <seqlist> nested within them, which contains further nested <item> ad <para> tags. This means the above find replace to remove the closing </para> tag will result in the closing </para> in the very last line in the example below being replaced too.

Basically what i need to say is: find </para></item> and replace with </item> UNLESS there is a opening <para> element to the left of it.

The very last line of the example below explains it better. If i do the above Find & Replace the last </para> will be removed and it will not parse.

Any ideas how to achive this please?

<seqlist>
  <item><para id="p7.1"><emphasis>JRK Type 1</emphasis>: (NSP XX-XX-XXX-XXXX)
outputs:
   <seqlist>
     <item><para id="p7.1.1">12 V or 15 V,0-5A</para></item>
     <item><para id="p7.1.2">12 V or 15 V,0-5A</para></item>
   </seqlist></para>
      <para>Both at 120 W maximum output power.</para><para>The outputs are isolated, permitting parallel or serial connection to provide power as required.</para></item>
    <item><para id="p7.2"><emphasis>JRK Type 2:</emphasis> (NSN 6130-99-788-6945) outputs:</para>
   <seqlist>
     <item><para id="p7.2.1">5 V, 0 - 30 A</para></item>
     <item><para id="p7.2.2">12 V, 0 - 0.5 A</para></item>
   </seqlist><para>Both at 120 W maximum output power.</para>
  <para>The 12 V outputs are measured with respect to a common 0 V line but these are isolated from the 5 V output.</para></item>
</seqlist>

Here is the trivial XSLT way:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="seqlist/item/para">
        <xsl:apply-templates/>
    </xsl:template>
</xsl:transform>

Online at http://xsltransform.net/3NSSEw6 .

If only those para elements with an id attribute are to be removed then use

<xsl:template match="seqlist/item/para[@id]">
    <xsl:apply-templates/>
</xsl:template>

for that template instead, http://xsltransform.net/3NSSEw6/1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM