简体   繁体   English

如何在XML消息中删除XML中的<和>

[英]How to remove < and > in XMLthat is part of the XML message

I have XML that look as follows: 我有如下所示的XML:

<StartTag>
    <MyValueTag>And the value itself contains a < bracket that makes the XML invalid</MyValueTag>
</StartTag>

The XML contains a '<' character that makes the XML invalid. XML包含使XML无效的'<'字符。

Now the easiest way is to fix the source of the XML but unfortunately I don't have control over the XML creation. 现在,最简单的方法是修复XML的源,但是不幸的是,我无法控制XML的创建。 It has messages like “ The value is < than 10” suppose to be “less than”. 它有类似“小于10”的消息。

Is there anyway how I can check the XML for things like this and escape those characters it? 无论如何,有什么方法可以检查XML这样的东西并转义那些字符吗?

I tried Looking at this post where the guy indicated that we should use JTidy. 我尝试在“看这篇文章”中看到那个人指示我们应该使用JTidy。 But when I tried it it doesn't remove the <: 但是当我尝试它时,它不会删除<:

Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setWraplen(Integer.MAX_VALUE);
tidy.setPrintBodyOnly(true);
tidy.setXmlOut(true);
tidy.setSmartIndent(true);
ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
tidy.parseDOM(inputStream, outputStream);

The fact that the XML is invalid means you aren't going to be able to use a valid XML parser to read it and fix it. XML无效的事实意味着您将无法使用有效的XML解析器来读取和修复它。 If you can't get the authors of the software that writes the file to fix the bug, then you will have to come up with some application specific solution. 如果您找不到写该文件的软件的作者来修复该错误,那么您将不得不提出一些特定于应用程序的解决方案。

For example, if you knew that the stray < char only occurs in the text of a <MyValue> element, and if you knew that no other elements could occur as children of <MyValue> , then it would be pretty easy to write a program that recognizes the start and end tags, and replaces any < characters that occur between them with &#60; 例如,如果您知道流浪<char仅出现在<MyValue>元素的文本中,并且如果您知道没有其他元素可以作为<MyValue>子元素出现,那么编写程序将很容易识别开始和结束标记,并用&#60;替换它们之间出现的所有<字符&#60;

Of course, if the problem isn't that simple, then the solution won't be that simple; 当然,如果问题不是那么简单,那么解决方案就不会那么简单。 but hopefully, you can make it simpler than solving the general problem for XML. 但是希望您可以使它比解决XML的一般问题更简单。

After you've fixed a few files "by hand," stop and ask yourself, "How did I know that < char needed to be escaped?" 在“手动”修复了几个文件后,停下来问自己:“我怎么知道<char需要转义?” Then write a program that operates on that same knowledge. 然后编写一个基于相同知识运行的程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM