简体   繁体   中英

replaceAll not working on escape character XML

​ I'm trying to parse XML into JSON using Java . JSON.parse is throwing this error on this character: 

JSON.parse: bad control character in string literal

I attempt to replace these characters before I send them to JSON.parse but this line of code is not working. Is there a better method of replacing/removing these characters completely?

String trim = desc.replaceAll("
", "\\n");

XML to be parsed

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod 
    tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim 
    veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea 
    commodo consequat. Duis aute irure dolor in reprehenderit in voluptate 
    velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint 
    occaecat cupidatat non proident, sunt in culpa qui officia deserunt 
    mollit anim id est laborum.

When the example you have shown contains the complete XML input you have, you are not parsing XML.

Assuming this is a fragment. Your solution escapes only one thing but to get valid JSON you should escape all characters which are not allowed in JSON or would lead to unwanted behavoiur. So it would be a good idea to look for something, that can propperly escape JSON for you like:

Java escape JSON String?

Figured it out:

  public static String cleanDescription(String desc){

        String trim = desc.replaceAll("<.*?>", ""); //removes html elements
        //there's a phantom question mark that sometimes gets added to the the front and end of the string
        if(!Character.isLetter(trim.charAt(0))) trim = trim.substring(1, trim.length());

        Integer charCount = 0;
        for(int j = 1; j <= 3; j++){
            if(!Character.isLetter(trim.charAt(trim.length() - j)) &&
                    !Character.isDigit(trim.charAt(trim.length() - j))) charCount++;
        }
        if(charCount >= 2) trim = trim.substring(0, trim.length() - (charCount - 1));


        Pattern pt = Pattern.compile("[^a-zA-Z0-9()\\.\\,]");
        Matcher match= pt.matcher(trim);
        while(match.find())
        {
            String s = match.group();
            trim = trim.replaceAll("\\" + s, " ");
        }

        return trim.trim();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM