简体   繁体   English

RapidXML从文件读取-这是什么问题?

[英]RapidXML reading from file - what is wrong here?

What's the difference between these two methods of reading an input file? 这两种读取输入文件的方法有什么区别?

1) Using 'ifstream.get()' 1)使用'ifstream.get()'

and

2) Using a vector<char> with ifstreambuf_iterator<char> (less understood by me!) 2)将vector<char>ifstreambuf_iterator<char> (我不太了解!)

(other than the obvious answer of having nifty vector methods to work with) (除了可以使用漂亮的矢量方法的明显答案之外)

The input file is XML, and as you see below, immediately parsed into a rapidxml document. 输入文件为XML,如下所示,该文件立即解析为Rapidxml文档。 (initialized elsewhere, see example main function.) (在其他地方初始化,请参见示例主要功能。)

First, let me show you two ways to write the 'load_config' function, one using ifstream.get() and one using vector<char> 首先,让我向您展示两种编写“ load_config”函数的方法,一种使用ifstream.get() ,另一种使用vector<char>

Method 1 ifstream.get() provides working code, and a safe rapidXML document object: 方法1 ifstream.get()提供工作代码和安全的RapidXML文档对象:

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not blaming it on curl_global_init just yet. 方法2导致另一个库产生混乱的RapidXML文档-具体来说,是对curl_global_init(CURL_GLOBAL_SSL)的调用[请参见下面的主要代码]-但我还没有将其归咎于curl_global_init。

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

main code: 主要代码:

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding. 我非常确定这是在单个线程中执行的,但是也许有些事情超出了我的理解。

I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. 我还担心仅通过更改文件加载功能就只能解决症状而不是原因。 Looking to the community for help here! 在这里向社区寻求帮助!

Question: Why would moving away from the vector to a character array fix this? 问题:为什么从向量移到字符数组会解决此问题?

Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly. 提示:我知道RapidXML使用了一些巧妙的内存管理,实际上可以直接访问输入字符串。

Hint: The main function above creates a dynamic (new) xml_document. 提示:上面的主要功能创建一个动态的(新的)xml_document。 This was not in the original code, and is an artifact of debugging changes. 这不是原始代码中的内容,而是调试更改的产物。 The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred. 原始(失败)代码声明了它,并且没有动态分配它,但是发生了相同的问题。

Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object. 完全公开的另一个提示(尽管我不明白为什么这么重要)-在这段代码混乱中还有一个vector的实例,该实例由Rapidxml :: xml_document对象中的数据填充。

The only difference between the two is that the vector version works correctly and the char array version causes undefined behavior when the file is longer than 65535 characters (it writes the \\0 to the 65535th or 65536th position, which are out-of-bounds). 两者之间的唯一区别是vector版本可以正常工作,并且当文件长度超过65535个字符时, char数组版本会导致未定义的行为(它将\\0写入到65535或65536的位置,这是超出范围的) 。

Another problem that is common to both versions, is that you read the file into a memory that has shorter life-time than the xml_document . 这两个版本共有的另一个问题是,您将文件读入寿命比xml_document短的内存中。 Read the documentation: 阅读文档:

The string must persist for the lifetime of the document. 该字符串必须在文档的生存期内一直存在。

When load_config exits the vector is destroyed and the memory is freed. load_config退出时, vector被销毁并释放内存。 Attempt to access the document cause reading invalid memory (undefined behavior). 尝试访问文档会导致读取无效的内存(未定义的行为)。

In the char array version the memory is allocated on the stack. char数组版本中,内存是在堆栈上分配的。 It is still 'freed' when load_config exists (accessing it causes undefined behavior). load_config存在时,它仍被“释放”(访问它会导致未定义的行为)。 But you don't see the crash because it has not yet been overwritten. 但是您看不到崩溃,因为它尚未被覆盖。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM