简体   繁体   中英

RapidXML (C++): Accessing text within node but outside child-node

XML:

<node> 
    Lorem ipsum 
        <child-node>dolor</child-node>
    TEXT TO BE ACCESSED 
</node>

<node> 
    sed do eiusmod tempor etc. 
</node>

This is read into an rapidxml::xml_document<> and parsed with the flag rapidxml::parse_validate_closing_tags as follows: doc.parse<rapidxml::parse_validate_closing_tags>() . (I would have thought that this flag solved the issue, but this does not appear to be the case.)

RapidXML C++ code looping through all <node> s of doc :

for (const rapidxml::xml_node<> *node = doc.first_node("node"); node != nullptr;  node = node->next_sibling()) {
std::cout << node->value();
}

node->value() returns Lorem ipsum during the first loop.

While the text within the <child-node> ( dolor ) is accessible by creating a new *node_2 = node->first_child() (within the loop) and then accessing the value with node_2->value() , the text that follows the <child node> ( TEXT TO BE ACCESSED ) is not accessible in a similar way. The documentation does not offer much in terms of advice. How might this be done with RapidXML?

The XML is intended to encode an edition of a text (following eg Perseus Digital Library ) and so the format used above is useful in order to mark specific words within sentences etc.

RapidXML parses XML into nodes of different types, in particular node_element and node_data nodes. For example, your <child-node>dolor</child-node> is actually a node_element node which contains a node_data with the value "dolor".

To make user code simpler, getting the value() of a node_element returns the value of it's first data node - but if you have complex markup you can iterate over the data nodes to extract those values.

Untested code below

for (const rapidxml::xml_node<> *node = doc.first_node("node"); node; node = node->next_sibling())
{
  for (const rapidxml::xml_node<> *n= node.first_node(); n; n= n->next_sibling())
  if (n->type() == rapidxml::node_element)
  {
    // handle child-node
  }  
  else if (n->type() == rapidxml::node_data)
    std::cout << n->value(); // handle the regular data nodes
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM