简体   繁体   English

Java创建XML并使用XSL创建HTML转义字符

[英]Java to create XML and using XSL to create HTML escaped characters

I have a bit of an issue 我有一个问题

  1. Get user data with java 使用Java获取用户数据
  2. Generate XML using JAXB 使用JAXB生成XML
  3. Create my XSL template 创建我的XSL模板
  4. Use Java to generate the HTML 使用Java生成HTML

Now i have an issue with \\r and \\n and some other funky symbols. 现在,我遇到了\\r\\n以及其他一些时髦符号的问题。 Should i parse the content of my xml with xml escapes or html escapes. 我应该用xml转义还是html转义来解析xml的内容。 The default Java escape utility class is doing a piss poor job of it and the custom class i found online isn't working either. 默认的Java转义实用程序类对此做得很差,而我在网上发现的自定义类也不起作用。

Would a good solution be to just replace \\n and \\r with <p> </p> or what html tag would be a good choice? 一个好的解决方案是将\\n\\r替换为<p> </p>还是什么html tag是一个好的选择? Thank you! 谢谢!

A simple example would be my date value in my xml which was passed in as a string and all escapes were used. 一个简单的示例是我在xml中的日期值,该日期值以字符串形式传递,并且使用了所有转义符。

Original: (same time, i don't remember which) - Mon, 29 Feb 2016 13:40:58 EST (-0500) 原文:(同一时间,我不记得是哪个)- Mon, 29 Feb 2016 13:40:58 EST (-0500)

Escaped XML entry: - <Date>Mon&amp;#044; 29 Feb 2016 03&amp;#058;40&amp;#058;43 EST&amp;#040;&amp;#045;0500&amp;#041;</Date> 转义的XML条目: - <Date>Mon&amp;#044; 29 Feb 2016 03&amp;#058;40&amp;#058;43 EST&amp;#040;&amp;#045;0500&amp;#041;</Date> <Date>Mon&amp;#044; 29 Feb 2016 03&amp;#058;40&amp;#058;43 EST&amp;#040;&amp;#045;0500&amp;#041;</Date>

Parsed HTML output: - Mon&#044; 29 Feb 2016 03&#058;40&#058;43 EST&#040;&#045;0500&#041; 解析的HTML输出: Mon&#044; 29 Feb 2016 03&#058;40&#058;43 EST&#040;&#045;0500&#041; Mon&#044; 29 Feb 2016 03&#058;40&#058;43 EST&#040;&#045;0500&#041;

Something clearly went wrong in the encoding and decoding of the special characters. 特殊字符的编码和解码显然出了问题。 but when this is parsed into html 但是当将其解析为html时

EDIT: I also have this junk which i don't even recognize was: &#xD; 编辑:我也有这个垃圾,我什至不知道是: &#xD;

EDIT: I fixed the date issue but it's still not encoding properly in parts. 编辑:我解决了日期问题,但它仍然不能正确编码部分。

public static String entityEncode(String text) {
    String result = text;
    if (result == null)
        return result;
    return StringEscapeUtils.escapeXml(XMLStringUtil.escapeControlChrs(result));
}

And the other class is: 另一类是:

public class XMLStringUtil {

    private static HashSet<Character> illegalChrSet = new HashSet<>();

    static {
        final String illegalChrs = "\u0000\u0001\u0002\u0003\u0004\u0005" +
                "\u0006\u0007\u0008\u000B\u000C\u000E\u000F\u0010\u0011\u0012" +
                "\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C" +
                "\u001D\u001E\u001F\uFFFE\uFFFF";

        for (int i=0; i < illegalChrs.length(); i++) {
            illegalChrSet.add(illegalChrs.charAt(i));
        }
    }

    public static String escapeControlChrs(String str) {
        if (str == null) {
            return null;
        }
        StringBuilder sb = new StringBuilder(str.length());
        for (int i=0; i < str.length(); i++) {
            char chr = str.charAt(i);
            if (illegalChrSet.contains(chr)) {
                sb.append("\\x");
                sb.append(String.format("%04x", (int) chr));
            } else {
                sb.append(chr);
            }
        }

        return sb.toString();
    }

    public static String removeControlChrs(String str) {
        if (str == null) {
            return null;
        }
        StringBuilder sb = new StringBuilder(str.length());
        for (int i=0; i < str.length(); i++) {
            char chr = str.charAt(i);
            if (! illegalChrSet.contains(chr)) {
                sb.append(chr);
            }
        }

        return sb.toString();
    }

but i still get this junk in the xml: 但是我仍然在xml中得到这个垃圾:

<Info>The origin domain used for comparison was: &#xD;
google.ca.ca&#xD;
blah blah blah&#xD;
</Info>

It occurs on new lines. 它发生在新行上。

The problem is when you are encoding to xml itself. 问题是当您编码为xml本身时。 HTML is parsing the values properly. HTML正在正确解析值。 For html & is &. 对于html&是&。 Please check how you are encoding to xml. 请检查您如何编码为xml。 XML should not be having all those ascii chars. XML不应具有所有这些ascii字符。

basically your string is having the character '/'. 基本上,您的字符串具有字符“ /”。 when encoded it is getting converted to for xml. 编码后,它已转换为xml。 This is not known to html. html未知。 Either when creating xml replace '/' with / and when decoded html will automatically convert to '/' 创建xml时,用/替换'/',并且解码的html会自动转换为'/'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM