简体   繁体   English

Java:从 XML 文本中删除 < 和 >(不是标签)

[英]Java: remove < and > from text in XML (not tags)

I'm having a hard time escaping xml to be processed by Java.我很难转义 xml 以供 Java 处理。 I'm using JTidy to escape unwanted characters, but struggle to remove "<" and ">" from values such as <tag> capacity < 1000 </tag>我正在使用 JTidy 来转义不需要的字符,但很难从<tag> capacity < 1000 </tag>等值中删除“<”和“>”

I'm using below code to escape the input我正在使用下面的代码来转义输入

    public String CleanXML(String input){

        Tidy tidy = new Tidy();
        tidy.setInputEncoding("UTF-16");
        tidy.setOutputEncoding("UTF-16");
        tidy.setWraplen(Integer.MAX_VALUE);
        tidy.setXmlOut(true);
        tidy.setSmartIndent(true);
        tidy.setXmlTags(true);
        tidy.setMakeClean(true);
        tidy.setForceOutput(true);
        tidy.setQuiet(true);
        tidy.setShowWarnings(false);
        StringReader in = new StringReader(input);
        StringWriter out = new StringWriter();
        tidy.parse(in, out);

        return out.toString();
    }

use following function使用以下功能

private static final Pattern TAG_REGEX = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);

public String CleanXML(String input){
    final Matcher matcher = TAG_REGEX.matcher(input);
    while (matcher.find()) {
        String value = matcher.group(1);
        String valueReplace = value.replaceAll("[^a-zA-Z0-9\\s]", "");
        input.replace(value,valueReplace);
    }
    return input;        
}

It uses regular expression search to get values between tags then, remove all non alphanumeric characters.它使用正则表达式搜索来获取标签之间的值,然后删除所有非字母数字字符。 Regular expressions and basic idea was gained from Java regex to extract text between tags正则表达式和基本思想是从Java regex中获得的, 用于提取标签之间的文本

If you want to remove tag terminals of XML, just convert it to a map and build string as you required refer XML to map in Java .如果您想删除 XML 的标记终端,只需将其转换为映射并构建字符串,因为您需要在 Java 中引用XML 映射

If you want to clean attribute values, you can iterate map and clean it then build a string or re convert it to the XML by map to XML in java如果你想清理属性值,你可以迭代映射并清理它然后构建一个字符串或在java中通过映射到XML将其重新转换为XML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM