简体   繁体   中英

Ignore whitespace while converting XML to JSON using XStream

I'm attempting to establish a reliable and fast way to transform XML to JSON using Java and I've started to use XStream to perform this task. However, when I run the code below the test fails due to whitespace (including newline), if I remove these characters then the test will pass.

    @Test
    public void testXmlWithWhitespaceBeforeStartElementCanBeConverted() throws Exception {
        String xml =
                "<root>\n" +
                "   <foo>bar</foo>\n" + // remove the newlines and white space to make the test pass
                "</root>";
        String expectedJson = "{\"root\": {\n" +
                "  \"foo\": bar\n" +
                "}}";
        String actualJSON = transformXmlToJson(xml);

        Assert.assertEquals(expectedJson, actualJSON);
    }

    private String transformXmlToJson(String xml) throws XmlPullParserException {
        XmlPullParser parser =  XppFactory.createDefaultParser();
        HierarchicalStreamReader reader = new XppReader(new StringReader(xml), parser, new NoNameCoder());
        StringWriter write = new StringWriter();
        JsonWriter jsonWriter = new JsonWriter(write);
        HierarchicalStreamCopier copier = new HierarchicalStreamCopier();
        copier.copy(reader, jsonWriter);
        jsonWriter.close();
        return write.toString();
    }

The test fails the exception:

com.thoughtworks.xstream.io.json.AbstractJsonWriter$IllegalWriterStateException: Cannot turn from state SET_VALUE into state START_OBJECT for property foo
    at com.thoughtworks.xstream.io.json.AbstractJsonWriter.handleCheckedStateTransition(AbstractJsonWriter.java:265)
    at com.thoughtworks.xstream.io.json.AbstractJsonWriter.startNode(AbstractJsonWriter.java:227)
    at com.thoughtworks.xstream.io.json.AbstractJsonWriter.startNode(AbstractJsonWriter.java:232)
    at com.thoughtworks.xstream.io.copy.HierarchicalStreamCopier.copy(HierarchicalStreamCopier.java:36)
    at com.thoughtworks.xstream.io.copy.HierarchicalStreamCopier.copy(HierarchicalStreamCopier.java:47)
    at testConvertXmlToJSON.transformXmlToJson(testConvertXmlToJSON.java:30)

Is there a way to to tell the copy process to ignore the ignorable white space. I cannot find any obvious way to enable this behaviour, but I think it should be there. I know I can pre-process the XML to remove the white space, or maybe just use another library.

update I can work around the issue using a decorator of the HierarchicalStreamReader interface and suppressing the white space node manually, this still does not feel ideal though. This would look something like the code below, which will make the test pass.

    public class IgnoreWhitespaceHierarchicalStreamReader implements HierarchicalStreamReader {
        private HierarchicalStreamReader innerHierarchicalStreamReader;

        public IgnoreWhitespaceHierarchicalStreamReader(HierarchicalStreamReader hierarchicalStreamReader) {
            this.innerHierarchicalStreamReader = hierarchicalStreamReader;
        }

        public String getValue() {
            String getValue = innerHierarchicalStreamReader.getValue();
            System.out.printf("getValue = '%s'\n", getValue);
            if(innerHierarchicalStreamReader.hasMoreChildren() && getValue.length() >0)         {
                if(getValue.matches("^\\s+$")) {
                    System.out.printf("*** White space value suppressed\n");
                    getValue = "";
                }
            }
            return getValue;
        }
        // rest of interface ...

Any help is appreciated.

Comparing two XML's as String objects is not a good idea. How are you going to handle case when xml is same but nodes are not in the same order.

eg

<xml><node1>1</node1><node2>2</node2></xml>

is similar to

<xml><node2>2</node2><node1>1</node1></xml>

but when you do a String compare it will always return false.

Instead use tools like XMLUnit. Refer to following link for more details,

Best way to compare 2 XML documents in Java

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM