[英]Best way to extract data between several xml tags from a file
> <DOC>
> <Car> Zen </Car>
> <Description> This is a bla bla model. which is a bla
> bla thisnf dsgs
> sdfsgssssssssssssssssssssssssssssssssssssssssssttttttttttttttwqqqqqqqqq
> dsgdsdsssssssssssegsegsdgsdgsdsssssssssssssssssssssttttttttttttt
> sdgssddddddddddddddddddddddddddddddddddddddddddddddsdddddddddwwww
> dgdssdddddddddddddddddddddddddddddsssssssssssssssssssssssswwwwwwwwwwww
> gdgdsssssssssssssssssssssssssssssssssssssssssssssssssseeeeeeeeeeeeee
> gddsssssssssssssssssssssssseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeqqqqqqqqqqq
> gsdsssssssssssssssssssssssssssssseqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
> dsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
> arrwerfkafjsdfsojfiosjfiosdfoisdoifjsdoifjiosdjfosdj</Description>
> <Year> 2015 </Year> <Color> Red </Color>
> <Engine> afsdf </Engine>
> </DOC>
---更多标签----
> <DOC>
> <Car> Zen1 </Car> <Description> This is the second text tag which is a
> bla bla thisnf dsgs
> sdfsgssssssssssssssssssssssssssssssssssssssssssttttttttttttttwqqqqqqqqq
> dsgdsdsssssssssssegsegsdgsdgsdsssssssssssssssssssssttttttttttttt
> sdgssddddddddddddddddddddddddddddddddddddddddddddddsdddddddddwwww
> dgdssdddddddddddddddddddddddddddddsssssssssssssssssssssssswwwwwwwwwwww
> gdgdsssssssssssssssssssssssssssssssssssssssssssssssssseeeeeeeeeeeeee
> gddsssssssssssssssssssssssseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeqqqqqqqqqqq
> gsdsssssssssssssssssssssssssssssseqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
> dsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
> arrwerfkafjsdfsojfiosjfiosdfoisdoifjsdoifjiosdjfosdj</Description>
> <Year> 2015 </Year> <Color> Red </Color> <Engine> afsdf </Engine>
> </DOC>
---更多标签----
我有几个文件,每个文件中都有许多这样的标签。 在和标签之间提取数据的最佳方法是什么? 这是我的方法
> for (File file : files) {
> System.out.println(file.getName());
> if
> (file.isFile()) {
> try {
> BufferedReader bufferReader = new BufferedReader(new FileReader(file)); > String line;
> XContentBuilder builder = null;
> StringBuffer sb = null;
> String descrption= null;
> String car =null;
> Boolean within_description = false;
> String Car= "";
> while ((line = bufferReader.readLine()) != null) {
> // System.out.println(line);
> if (line.equals("<DOC>")) {
> builder = jsonBuilder();
> sb = new StringBuffer();
> continue;
> }
>
>
> if (line.contains("<CAR>")) {
>
> String remove_front_space = line.replaceAll("<CAR>",
> "");
> car= remove_front_space .replaceAll("</CAR>",
> "").trim();
> builder = builder.startObject().field("CARR",
> car);
> continue;
> }
> if (line.equals("</DESCRIPTION>")) {
> within_description = false;
> continue;
> }
> if (within_description) {
> sb.append(line);
> continue;
> }
> if (line.equals("<DESCRIPTION>")) {
> within_description = true;
> continue;
> }
> if (line.equals("</DOC>")) {
> // JSONifying the string data
> text_toadd = sb.toString();
> builder = builder.field("text", text_toadd)
> .endObject();
> sb = null;
----进行数据库调用,并将CAR和DESCRIPTION信息存储到数据库中。
> }
> bufferReader.close();
> String json = builder.string();
> System.out.println(json);
>
> } catch (IOException e) {
> e.printStackTrace();
> }
> } }
任何建议都可以。 提前致谢!
有一位优秀的岗位在这里它讨论了如何使用DOM解析器来提取XML文件成树。 要记住的重要一点是,您的XML文件中必须有一个根元素。 否则,将导致以下SAXParseException:
org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
如果您没有根元素,则此异常会产生误导。 发生的情况是解析器假定它命中的第一个标记是根元素。 当它遇到这个假定根元素的关闭的内容以外,它滑过和死亡。 应当清楚的是,解析器失败是因为它试图建立一棵树,但是没有根,它就有了悬挂的标签,这些标签不能被附加到任何东西上。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.