繁体   English   中英

如何使用Java中xml的节点在包含许多xml文件以及其他文本的文本文件中标识xml?

[英]How to identify an xml in a text file containing many xml files along with other text using the node of the xml in Java?

我想读取整个文本文件,并根据搜索输入获取并保存第二个XML到本地驱动器中

午夜雨

文本文件内容:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
</catalog>
controllercmds.statusupdate
ExtnClientExternalSrcProcess="9"
<catalog>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>'

我的输出应为:

<catalog>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

这可行吗? 有人可以帮我吗

我想您应该提到您正在使用的编程语言,以便人们无论如何都能为您提供带有代码的解决方案,现在我可以想到正则表达式仅是解决方案,您必须知道什么是代码应寻找的根标记。 像上面一样,我可以看到是根标签。 我将在几个小时内尝试完善代码解决方案。

以下代码在JDK 6中有效,并且在以后的版本中也应适用

String xml = "<?xml version=\"1.0\"?>" +
"<catalog>" +
"<book id=\"bk101\">" +
   "<author>Gambardella, Matthew</author>" +
   "<title>XML Developer's Guide</title>" +
   "<genre>Computer</genre>" +
   "<price>44.95</price>" +
   "<publish_date>2000-10-01</publish_date>" +
   "<description>An in-depth look at creating applications" + 
   "with XML.</description>" +
"</book>" +
"</catalog>" +
"controllercmds.statusupdate" +
"ExtnClientExternalSrcProcess=\"9\"" +
"<catalog>" +
"<book id=\"bk102\">" +
   "<author>Ralls, Kim</author>" +
   "<title>Midnight Rain</title>" +
   "<genre>Fantasy</genre>" +
   "<price>5.95</price>" +
   "<publish_date>2000-12-16</publish_date>" +
   "<description>A former architect battles corporate zombies," + 
   "an evil sorceress, and her own childhood to become queen " +
   "of the world.</description>" +
"</book>" +
"</catalog>";

String regex = "(\\<catalog\\>.*?\\</catalog\\>)";

java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(xml); 

while(matcher.find()) {

    System.out.println("Groups: " + matcher.group(1));
}

System.out.println("DONE");

输出是

Groups: <catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applicationswith XML.</description></book></catalog>
Groups: <catalog><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queen of the world.</description></book></catalog>
DONE

在这里查看您的代码在线运行

在一般情况下,这将很困难。 但是,如果您知道输入符合某些特定约束,则可能会容易得多。 例如,如果您知道XML片段将以<catalog>开头并以</catalog>结束,并且您对这两个字符串不会在其他任何地方有很高的信心,则可以使用正则表达式解压缩XML片段。应该不会太困难。 因此,我认为答案很大程度上取决于您对约束的了解,以及您准备在“意外”(或恶意!)出现在意外地方的开始/结束标签方面要承担的风险。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM