简体   繁体   中英

Replace double quote with " in XML file

I have a XML file which contain quotes as follows

<feast key="NAME" value="NAME TEST 'xxxxx"yyyy' $"/>

I need to replace xxxxx"yyyy to xxxxx&quot;yyyy in all occurrence.

Note: xxxxx and yyyy are defined by user. So it can be of any form.

Here i included the sample XML and code to parse.

TestSaxParse.xml

<?xml version="1.0" encoding="US-ASCII" ?> 
<TEST Office="TEST Office">
    <LINE key="112313133320">
        <TESTNO value="0"/>
        <FEATURE>
            <feast key="001" value="001"/>
            <feast key="NAME" value="NAME TEST 'xxxxx_&_yyyy' $"/>
        </FEATURE>
    </LINE>
    <LINE key="112313133321">
        <TESTNO value="0"/>
        <FEATURE>
            <feast key="002" value="002"/>
            <feast key="NAME" value="NAME TEST 'xxxxx"yyyy' $"/>
        </FEATURE>
    </LINE>
</TEST>

SaxParseEx.java

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxParseEx extends DefaultHandler{

    private static String xmlFilePath = "/home/system/TestSAXParse.xml";

    public static void main(String[] args) {

        SaxParseEx SaxParseEx = new SaxParseEx();
        SAXParserFactory fact = SAXParserFactory.newInstance();
        SAXParser parser;
        try {

            Path path = Paths.get(xmlFilePath);
            Charset charset = StandardCharsets.UTF_8;
            String content = new String(Files.readAllBytes(path), charset);

            // replace & with &amp; 
            content = content.replaceAll( "(&(?!amp;))", "&amp;");
           // content = content.replaceAll( "(\"(?!quot;))", "&quot;"); Need regex to replace " with &quot; only on specific place where i mentioned above

            // Write updated content to XML file
            Files.write(path, content.getBytes(charset));

            // XML Parsing
            parser = fact.newSAXParser();
            parser.parse(new File(xmlFilePath), SaxParseEx);
            System.out.println("PARSE SUCCESS");
            return;
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("PARSE FAILED");
    }
}

O/P

org.xml.sax.SAXParseException; systemId: file:/home/system/TestSAXParse.xml; lineNumber: 14; columnNumber: 46; Element type "feast" must be followed by either attribute specifications, ">" or "/>".

I have replace all & with &amp; to fix the SAXParseException on Line No. 7. I cannot replace " with &quot;

EDIT:

I cannot use this answer . I'm looking for different solution because of

  1. The XML file is large size ( > 100MB)
  2. So i think it is not feasible to compile and replace every line within double quote values as suggested in the answer.
  3. I'm looking for replace all as like

content = content.replaceAll( "(&(?!amp;))", "&amp;");

Is there any possibility to write a regex like that?

I replaced all " with &quot; when it is enclosed with ' . So i added below lines before to Files.write

Pattern pattern = Pattern.compile("'(.*[\"].*)'");
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
    content = content.replaceAll(matcher.group(1), matcher.group(1).replace("\"", "&quot;"));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM