如何使用Java中xml的节点在包含许多xml文件以及其他文本的文本文件中标识xml？

Question

我想读取整个文本文件，并根据搜索输入获取并保存第二个XML到本地驱动器中

午夜雨

文本文件内容：

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
</catalog>
controllercmds.statusupdate
ExtnClientExternalSrcProcess="9"
<catalog>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>'

我的输出应为：

<catalog>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

这可行吗？ 有人可以帮我吗

Answer 1

我想您应该提到您正在使用的编程语言，以便人们无论如何都能为您提供带有代码的解决方案，现在我可以想到正则表达式仅是解决方案，您必须知道什么是代码应寻找的根标记。 像上面一样，我可以看到是根标签。 我将在几个小时内尝试完善代码解决方案。

以下代码在JDK 6中有效，并且在以后的版本中也应适用

String xml = "<?xml version=\"1.0\"?>" +
"<catalog>" +
"<book id=\"bk101\">" +
   "<author>Gambardella, Matthew</author>" +
   "<title>XML Developer's Guide</title>" +
   "<genre>Computer</genre>" +
   "<price>44.95</price>" +
   "<publish_date>2000-10-01</publish_date>" +
   "<description>An in-depth look at creating applications" + 
   "with XML.</description>" +
"</book>" +
"</catalog>" +
"controllercmds.statusupdate" +
"ExtnClientExternalSrcProcess=\"9\"" +
"<catalog>" +
"<book id=\"bk102\">" +
   "<author>Ralls, Kim</author>" +
   "<title>Midnight Rain</title>" +
   "<genre>Fantasy</genre>" +
   "<price>5.95</price>" +
   "<publish_date>2000-12-16</publish_date>" +
   "<description>A former architect battles corporate zombies," + 
   "an evil sorceress, and her own childhood to become queen " +
   "of the world.</description>" +
"</book>" +
"</catalog>";

String regex = "(\\<catalog\\>.*?\\</catalog\\>)";

java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(xml); 

while(matcher.find()) {

    System.out.println("Groups: " + matcher.group(1));
}

System.out.println("DONE");

输出是

Groups: <catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applicationswith XML.</description></book></catalog>
Groups: <catalog><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queen of the world.</description></book></catalog>
DONE

在这里查看您的代码在线运行

Answer 2

在一般情况下，这将很困难。 但是，如果您知道输入符合某些特定约束，则可能会容易得多。 例如，如果您知道XML片段将以<catalog>开头并以</catalog>结束，并且您对这两个字符串不会在其他任何地方有很高的信心，则可以使用正则表达式解压缩XML片段。应该不会太困难。 因此，我认为答案很大程度上取决于您对约束的了解，以及您准备在“意外”（或恶意！）出现在意外地方的开始/结束标签方面要承担的风险。

如何使用Java中xml的节点在包含许多xml文件以及其他文本的文本文件中标识xml？

问题描述

2 个解决方案

解决方案1
0 2015-10-14 04:25:24

解决方案2
0 2015-10-14 07:48:16

如何使用Java中xml的节点在包含许多xml文件以及其他文本的文本文件中标识xml？

问题描述

2 个解决方案

解决方案1 0 2015-10-14 04:25:24

解决方案2 0 2015-10-14 07:48:16

解决方案1
0 2015-10-14 04:25:24

解决方案2
0 2015-10-14 07:48:16