简体   繁体   中英

How can I get my Parsing ATOM feed with SimpleXML (java) to return ellipsis instead of &#8230

I have a line of XML in my Atom feed (UTF-8) formatted with an ellipsis, like this.

<title type="html"><![CDATA[THIS WEEK IN HISTORY&#8230;]]></title>

To access the title, I call title.getText() .

  • Actual result: THIS WEEK IN HISTORY&#8230;
  • Expected result: THIS WEEK IN HISTORY…

Here's my Title class. What am I doing wrong with SimpleXML?

    public static class Title {

        @Attribute(name = "type", required = false)
        String type;
        @Text
        String text;

        public String getText() {
            return this.text;
        }

        void setText(String text) {
            this.text = text;
        }

        public String getType() {
            return this.type;
        }

        public void setType(String _value) {
            this.type = _value;
        }
    }

The solution to your problem is StringEscapeUtils.unescapeHtml4("&#8230;")

Hence giving the output as " ... " StringEscapeUtils provides with unescapeHtml4() to convert the HTML Number to Symbol which is found in the Jakarta Commons Lang Library

unescapeHtml4() Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.

来自Apache Commons Lang库的StringEscapeUtils.escapeHtml4()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM