简体   繁体   中英

How to unescape non-standard characters in XML in Java?

I realize a similar question has been asked before, and the solution is to use StringEscapeUtils.unescape() . However, per the method description:

Supports only the five basic XML entities (gt, lt, quot, amp, apos). Does not support DTDs or external entities.

I have a bunch of XML files with escaped characters like ␣ and &hyph; . How can I unescape these? They are defined in the DTD provided. Is there a method like StringEscapeUtils but one with DTD support?

Hmm, it's been a long time, but I think an implementation of EntityResolver2 (Java SDK) handles externally defined entities. This is part of the SAX2 specification.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM