![](/img/trans.png)
[英]extracting xml node(not text but complete xml ) and with other test nodes from xml file using SAX parser in java
[英]How to identify an xml in a text file containing many xml files along with other text using the node of the xml in Java?
我想读取整个文本文件,并根据搜索输入获取并保存第二个XML到本地驱动器中
午夜雨
文本文件内容:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
</catalog>
controllercmds.statusupdate
ExtnClientExternalSrcProcess="9"
<catalog>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>'
我的输出应为:
<catalog>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
这可行吗? 有人可以帮我吗
我想您应该提到您正在使用的编程语言,以便人们无论如何都能为您提供带有代码的解决方案,现在我可以想到正则表达式仅是解决方案,您必须知道什么是代码应寻找的根标记。 像上面一样,我可以看到是根标签。 我将在几个小时内尝试完善代码解决方案。
以下代码在JDK 6中有效,并且在以后的版本中也应适用
String xml = "<?xml version=\"1.0\"?>" +
"<catalog>" +
"<book id=\"bk101\">" +
"<author>Gambardella, Matthew</author>" +
"<title>XML Developer's Guide</title>" +
"<genre>Computer</genre>" +
"<price>44.95</price>" +
"<publish_date>2000-10-01</publish_date>" +
"<description>An in-depth look at creating applications" +
"with XML.</description>" +
"</book>" +
"</catalog>" +
"controllercmds.statusupdate" +
"ExtnClientExternalSrcProcess=\"9\"" +
"<catalog>" +
"<book id=\"bk102\">" +
"<author>Ralls, Kim</author>" +
"<title>Midnight Rain</title>" +
"<genre>Fantasy</genre>" +
"<price>5.95</price>" +
"<publish_date>2000-12-16</publish_date>" +
"<description>A former architect battles corporate zombies," +
"an evil sorceress, and her own childhood to become queen " +
"of the world.</description>" +
"</book>" +
"</catalog>";
String regex = "(\\<catalog\\>.*?\\</catalog\\>)";
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(xml);
while(matcher.find()) {
System.out.println("Groups: " + matcher.group(1));
}
System.out.println("DONE");
输出是
Groups: <catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applicationswith XML.</description></book></catalog>
Groups: <catalog><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queen of the world.</description></book></catalog>
DONE
在一般情况下,这将很困难。 但是,如果您知道输入符合某些特定约束,则可能会容易得多。 例如,如果您知道XML片段将以<catalog>
开头并以</catalog>
结束,并且您对这两个字符串不会在其他任何地方有很高的信心,则可以使用正则表达式解压缩XML片段。应该不会太困难。 因此,我认为答案很大程度上取决于您对约束的了解,以及您准备在“意外”(或恶意!)出现在意外地方的开始/结束标签方面要承担的风险。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.