[英]Java 8 function to remove invalid whitespace from XML tags
I'm attempting to perform a data migration exercise, loading a series of XML files into a new format.我正在尝试执行数据迁移练习,将一系列 XML 文件加载为新格式。 The problem I've identified is that the XML from the legacy app has been generated with invalid tags, ie some of the tags contain whitespaces eg:我发现的问题是来自旧版应用程序的 XML 是用无效标签生成的,即一些标签包含空格,例如:
<Serial>0001</Serial>
<Document ID>12345</Document ID>
<Document Type>TypeA</Document Type>
Unfortunately there is no staff resource from the legacy system to correct the XML, so my only option is to fix it as I process the data.不幸的是,遗留系统没有人员资源来纠正 XML,所以我唯一的选择是在处理数据时修复它。
Can anyone help with a neat way of using Java 8 to remove the whitespaces from the XML tags, making them valid to parse?任何人都可以帮助使用 Java 8 从 XML 标记中删除空格的巧妙方法,使它们可以有效解析吗?
My original code to extract the XML was我提取 XML 的原始代码是
final XmlMapper xmlMapper = new XmlMapper();
final JsonNode jsonNode = xmlMapper.readTree(metadata);
return objectMapper.convertValue(jsonNode, new TypeReference<Map<String, String>>()
Many thanks非常感谢
This is not Java8-specific solution, but it does what you ask for.这不是特定于 Java8 的解决方案,但它可以满足您的要求。 Having your entire XML as a string, it uses pattern matching to identify the XML tags, and removes any space character inside them.将整个 XML 作为字符串,它使用模式匹配来识别 XML 标记,并删除其中的任何空格字符。 Finally, correctXmlString
has valid XML tags.最后, correctXmlString
具有有效的 XML 标记。
String wrongXmlString = <Document ID>12345</Document ID>;
// Regex for matching xml tags
Matcher matcher = Pattern.compile("<[^>]+>").matcher(str);
// String builder for creating the correct XML.
StringBuffer xmlBuilder = new StringBuffer();
while (matcher.find())
{
// for every string match (i.e., for each tag)
String tag = matcher.group();
// remove any spaces and append the correct string
matcher.appendReplacement(xmlBuilder, tag.replaceAll(" +", ""));
}
matcher.appendTail(xmlBuilder);
String correctXmlString = xmlBuilder.toString();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.