简体   繁体   English

从 XML 标签中删除无效空格的 Java 8 函数

[英]Java 8 function to remove invalid whitespace from XML tags

I'm attempting to perform a data migration exercise, loading a series of XML files into a new format.我正在尝试执行数据迁移练习,将一系列 XML 文件加载为新格式。 The problem I've identified is that the XML from the legacy app has been generated with invalid tags, ie some of the tags contain whitespaces eg:我发现的问题是来自旧版应用程序的 XML 是用无效标签生成的,即一些标签包含空格,例如:

<Serial>0001</Serial>
<Document ID>12345</Document ID>
<Document Type>TypeA</Document Type>

Unfortunately there is no staff resource from the legacy system to correct the XML, so my only option is to fix it as I process the data.不幸的是,遗留系统没有人员资源来纠正 XML,所以我唯一的选择是在处理数据时修复它。

Can anyone help with a neat way of using Java 8 to remove the whitespaces from the XML tags, making them valid to parse?任何人都可以帮助使用 Java 8 从 XML 标记中删除空格的巧妙方法,使它们可以有效解析吗?

My original code to extract the XML was我提取 XML 的原始代码是

final XmlMapper xmlMapper = new XmlMapper();
final JsonNode jsonNode = xmlMapper.readTree(metadata);
return objectMapper.convertValue(jsonNode, new TypeReference<Map<String, String>>() 

Many thanks非常感谢

This is not Java8-specific solution, but it does what you ask for.这不是特定于 Java8 的解决方案,但它可以满足您的要求。 Having your entire XML as a string, it uses pattern matching to identify the XML tags, and removes any space character inside them.将整个 XML 作为字符串,它使用模式匹配来识别 XML 标记,并删除其中的任何空格字符。 Finally, correctXmlString has valid XML tags.最后, correctXmlString具有有效的 XML 标记。

String wrongXmlString = <Document ID>12345</Document ID>;

// Regex for matching xml tags
Matcher matcher = Pattern.compile("<[^>]+>").matcher(str);

// String builder for creating the correct XML.
StringBuffer xmlBuilder = new StringBuffer();

while (matcher.find()) 
{
  // for every string match (i.e., for each tag) 
  String tag = matcher.group();
  // remove any spaces and append the correct string
  matcher.appendReplacement(xmlBuilder, tag.replaceAll(" +", ""));
}

matcher.appendTail(xmlBuilder);

String correctXmlString = xmlBuilder.toString();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM