简体   繁体   English

在Java中的标签之间从XML获取文本

[英]Get text from XML between tags in Java

I have xml entries like the following. 我有如下的xml条目。 I want to extract everything after the d:index tag closes to the end of entry. 我想在d:index标记接近条目结尾后提取所有内容。

<d:entry id="some_id" d:title="some_title">
        <d:index d:value="some_value"/>
        <h1>headlines</h1>

        <p>paragraphs</p>
        <div>
           <ul>
              <li>lists</li>

           </ul>
        </div>
        text like that
</d:entry>

I tried using 我尝试使用

dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(file);
            doc.getDocumentElement().normalize();
eList = doc.getElementsByTagName("d:entry");
for (int i = 0; i < eList.getLength(); i++){
    Node nNode = eList.item(i);
    textList[i] = nNode.getTextContent();
}

But, .getTextContent() only gives me 'text like that' and not 但是,.getTextContent()只给我“那样的文本”,而没有

<h1>headlines</h1>

<p>paragraphs</p>
   <div>
     <ul>
      <li>lists</li>

     </ul>
   </div>
text like that

Depending on what you exactly want to do, you could do something like this: 根据您确实要执行的操作,可以执行以下操作:

import java.io.File;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class Arbeiter {

public void arbeiten(File datei)
{
    Document doc = getDoc(datei);
    Element element = doc.getDocumentElement();
    print(element);
}

private Document getDoc(File datei)
{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    Document doc = null;
    try {
        DocumentBuilder db = dbf.newDocumentBuilder();
        doc = db.parse(datei);
    } catch (ParserConfigurationException | SAXException | IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return doc;
}

private void print(Node node)
{
    for (int i=0; i<node.getChildNodes().getLength(); i++)
    {
        print(node.getFirstChild());
    }
    if(node.getTextContent()!=null)
    {
        System.out.println(node.getTextContent());
    }
}

}

The output is: 输出为:

headlines
    paragraphs     
          lists
    text like that

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM