简体   繁体   English

执行子字符串操作时包含分隔符

[英]Include delimiter when performing substring operation

How do I include the delimiter when performing a substring operation?执行子字符串操作时如何包含分隔符?

ie given the string message which looks like this:即给出如下所示的字符串message

<nutrition>
<daily-values>
    <total-fat units="g">65</total-fat>
    <saturated-fat units="g">20</saturated-fat>
    <cholesterol units="mg">300</cholesterol>
    <sodium units="mg">2400</sodium>
    <carb units="g">300</carb>
    <fiber units="g">25</fiber>
    <protein units="g">50</protein>
</daily-values>
</nutrition>
<food>
    <name>Avocado Dip</name>
    <mfr>Sunnydale</mfr>
    <serving units="g">29</serving>
    <calories total="110" fat="100"/>
    <total-fat>11</total-fat>
    <saturated-fat>3</saturated-fat>
    <cholesterol>5</cholesterol>
    <sodium>210</sodium>
    <carb>2</carb>
    <fiber>0</fiber>
    <protein>1</protein>
    <vitamins>
        <a>0</a>
        <c>0</c>
    </vitamins>
    <minerals>
        <ca>0</ca>
        <fe>0</fe>
    </minerals>
</food>

and then进而

message = message.substring(message.indexOf("<food>"), message.indexOf("</food>"));

returns返回

<food>
    <name>Avocado Dip</name>
    <mfr>Sunnydale</mfr>
    <serving units="g">29</serving>
    <calories total="110" fat="100"/>
    <total-fat>11</total-fat>
    <saturated-fat>3</saturated-fat>
    <cholesterol>5</cholesterol>
    <sodium>210</sodium>
    <carb>2</carb>
    <fiber>0</fiber>
    <protein>1</protein>
    <vitamins>
        <a>0</a>
        <c>0</c>
    </vitamins>
    <minerals>
        <ca>0</ca>
        <fe>0</fe>
    </minerals>

How do I get it to keep the last </food> tag given I don't know the surrounding content of the XML file?鉴于我不知道 XML 文件的周围内容,如何让它保留最后一个</food>标签?

Here's a solution using javax.xml .这是使用javax.xml的解决方案。 It aims to solve the case when multiple <food> elements are present in the document.它旨在解决文档中存在多个<food>元素的情况。 In order to handle this case correctly, you need to为了正确处理这种情况,您需要

  1. deserialize your XML into org.w3c.dom.Document将您的 XML 反序列化为org.w3c.dom.Document
  2. extract the list of <food> nodes as org.w3c.dom.NodeList<food>节点列表提取为org.w3c.dom.NodeList
  3. serialize back to String at the end最后序列化回字符串

Here's a simplified example:这是一个简化的示例:

private static final String XML =
    "<?xml version = \"1.0\" encoding = \"UTF-8\"?>\n"
        + "<message>\n"
        + "  <food>\n"
        + "    <name>A</name>\n"
        + "  </food>\n"
        + "  <food>\n"
        + "    <name>B</name>\n"
        + "  </food>\n"
        + "</message>\n";

@Test
public void xpath() throws Exception {
  // Deserialize
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  Document document;
  try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
    document = factory.newDocumentBuilder().parse(in);
  }
  XPath xPath = XPathFactory.newInstance().newXPath();
  XPathExpression expr = xPath.compile("//food");
  NodeList nodeList = (NodeList) expr.evaluate(document, XPathConstants.NODESET);

  for (int i = 0; i < nodeList.getLength(); i++) {
    Node node = nodeList.item(i);
    System.out.println(node.getNodeName() + ": " + node.getTextContent().trim());
  }

  // Serialize
  Document exportDoc = factory.newDocumentBuilder().newDocument();
  Node exportNode = exportDoc.importNode(nodeList.item(0), true);
  exportDoc.appendChild(exportNode);
  String content = serialize(exportDoc);
  System.out.println(content);
}

private static String serialize(Document doc) throws TransformerException {
  DOMSource domSource = new DOMSource(doc);
  StringWriter writer = new StringWriter();
  StreamResult result = new StreamResult(writer);
  TransformerFactory tf = TransformerFactory.newInstance();
  Transformer transformer = tf.newTransformer();
  // set indent
  transformer.setOutputProperty(OutputKeys.INDENT, "yes");
  transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
  transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
  transformer.transform(domSource, result);
  return writer.toString();
}

The 1st output shows all <food> elements are deserialized correctly:第一个输出显示所有<food>元素都被正确反序列化:

food: A
food: B

The 2nd output shows the 1st element are serialized back to string:第二个输出显示第一个元素被序列化回字符串:

<food>

  <name>A</name>

</food>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM