简体   繁体   中英

Address Out of bounds error when reading xml

I am getting a weird segfault when using libxml to parse a file. This code worked previously when I compiled it as a 32bit application. I changed it to a 64 bit application and it stops working.

The seg fault comes in at "if (xmlStrcmp(cur->name, (const xmlChar *) "servers"))"

cur->name is a const xmlChar * and it points to an address that says its out out bounds. But when I debug and go to that memory location, that data is correct.

int XmlGetServers()
{
xmlDocPtr doc;
xmlNodePtr cur;

doc = xmlParseFile("Pin.xml");
if (doc == NULL)
{
    std::cout << "\n Pin.xml not parsed successfully." << std::endl;
    return -1;
}
cur = xmlDocGetRootElement(doc);

if (cur == NULL)
{
    std::cout << "\n Pin.xml is empty document." << std::endl;
    xmlFreeDoc(doc);
    return -1;
}
if (xmlStrcmp(cur->name, (const xmlChar *) "servers"))
{
    std::cout << "\n ERROR: Pin.xml of the wrong type, root node != servers." << std::endl;
    xmlFreeDoc(doc);
    return -1;
}
}

Before cur is initialized the name parameter is

Name : name
    Details:0xed11f72000007fff <Address 0xed11f72000007fff out of bounds>

After cur is initialized the name parameter is

Name : name
    Details:0x64c43000000000 <Address 0x64c43000000000 out of bounds> 

Referenced XML file

<?xml version="1.0"?>

<servers>

<server_info>

    <server_name>Server1</server_name>

    <server_ip>127.0.0.1</server_ip> 

    <server_data_port>9000</server_data_port> 

</server_info>

<server_info>

    <server_name>Server2</server_name> 

    <server_ip>127.0.0.1</server_ip> 

    <server_data_port>9001</server_data_port> 

</server_info>

</servers>

System:

OS: Redhat Enterprise Linux 6.4 64-bit

GCC: 4.4.7-3

packages: libxml2-2.7.6-8.el6_3.4.x86_64

I took your code, as is, and added:

#include <libxml/parser.h>
#include <iostream>

then renamed the function to main() and compiled it on x86-64 Fedora 22, which has libxml2 2.9.2

The resulting code ran successfully, using the sample file, with no segfaults. Even valgrind found no memory access violation. As proof, the resulting, abbreviated strace log is as follows:

stat("Pin.xml", {st_mode=S_IFREG|0644, st_size=362, ...}) = 0
stat("Pin.xml", {st_mode=S_IFREG|0644, st_size=362, ...}) = 0
stat("Pin.xml", {st_mode=S_IFREG|0644, st_size=362, ...}) = 0
open("Pin.xml", O_RDONLY)               = 3
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "<?xml version=\"1.0\"?>\n\n<servers>\n\n<server_info>\n\n    <server_name>Server1</server_name>\n\n    <server_ip>127.0.0.1</server_ip> \n\n    <server_data_port>9000</server_data_port> \n\n</server_info>\n\n<server_info>\n\n    <server_name>Server2</server_name> \n\n    <ser"..., 8192) = 362
read(3, "", 7830)                       = 0
getcwd("/tmp", 1024)                    = 5
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Although this is Fedora with slightly new libxml2 and gcc, this difference does not matter. The answer here is that there's nothing wrong with the code that's shown here. I see nothing wrong with it.

But it is obviously a part of a much larger application, and your memory corruption is happening in some other part of your application, and it only manifests itself when your application's execution reaches this part.

The thing about C++ is that just because the code crashes at a particular point, it doesn't mean that this particular line of code is where the problem is. It shouldn't be too hard to come up with a simple example:

#include <iostream>
#include <cstring>

int main()
{

    char foo[3];

    strcpy(foo, "FoobarbazXXXXXXXXXXXXXXXXXXXXXX");

    for (int i=0; i<100; i++)
        std::cout << i << std::endl;
    return 0;
}

The bug here obviously occurs in the strcpy line. But the code will run just fine, and print 100 numbers from 0 to 99, and crash when main() returns. But, obviously, "return 0" is not where the bug is.

This is analogous to what's happening with your application. Some kind of memory corruption occurs at some point, which doesn't materially affect code execution until your code tries to parse your XML file.

Welcome to C++.

问题是我们在代码中使用了#pragma pack(1),这意味着DOMParser中的bool打包为1个字节,而Xerces没有#pragma pack并获得4个字节的默认打包。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM