简体   繁体   中英

Double free or corruption (out) on assignment operator

I am working on a parallel code. In my main function I have a loop over time, and at the start I need to copy the class by using the assignment operator. But somehow at 4th step, the double free or corruption error occurs on one of the processor, and others fine; and the error on std::set and set::map. Below is the part of the code and main loop.

    class Mesh
    {
      public:

        const Mesh &operator=(const Mesh &mesh);

        std::set<size_t> ghostSet;
        std::map<size_t, size_t> localIndex;
    }

Assignment operator:

    const Mesh &operator=(const Mesh &mesh)
    {
      std::set<size_t>().swap(ghostSet);  ///BUG here
      std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here
      for(auto const &it : mesh.localIndex)
        localIndex[it.first] = it.second;
      for(auto const &it : mesh.ghostSet)
        ghostSet.insert(it);
      return *this;
    }

main function:

    int main(int argc, char *argv[])
    {
      Mesh ms, ms_gh;
      /// Some operation to ms;
      for(size_t t = 0; t != 10; t++)
      {
        /// Some operation to ms;
        ms_gh = ms;
        /// Some operation to ms_gh;
      }
    }

    #0  0x00002aaab2405207 in raise () from /lib64/libc.so.6
    #1  0x00002aaab24068f8 in abort () from /lib64/libc.so.6
    #2  0x00002aaab2447cc7 in __libc_message () from /lib64/libc.so.6
    #3  0x00002aaab2450429 in _int_free () from /lib64/libc.so.6
    #4  0x000000000041bfba in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned long> >::deallocate (this=07fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/ext/new_allocator.h:110
    #5  0x000000000041835c in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_put_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:374
    #6  0x000000000041276e in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_destroy_node (this=0x7fffffff8b50, __p=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:422
    #7  0x000000000040c8ad in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x7131c0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1127
    #8  0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72f410)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #9  0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x72b760)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #10 0x000000000040c88a in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::_M_erase (this=0x7fffffff8b50, __x=0x70fce0)
at /usr/include/c++/4.8.2/bits/stl_tree.h:1125
    #11 0x00000000004080c4 in std::_Rb_tree<unsigned long, unsigned long, std::_Identity<unsigned long>, std::ess<unsigned long>, std::allocator<unsigned long> >::~_Rb_tree (this=0x7fffffff8b50, __in_chrg=<optimized ut>)
at /usr/include/c++/4.8.2/bits/stl_tree.h:671
    #12 0x0000000000407bbc in std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> ::~set (this=0x7fffffff8b50, 
__in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/stl_set.h:90
    #13 0x0000000000405003 in Mesh::operator= (this=0x7fffffffa8a0, mesh=...)
at mesh.cpp:73
    #14 0x000000000048eb98 in DynamicMesh::reattach_ghost (mpi_comm=1140850688, 
ms=..., cn=..., ms_gh=..., gh=..., cn_gh=..., ale=..., t=4)
at dynamicMesh.cpp:273

In this case the traceback #13 corresponds to swap the std::set.

My problem is why this kind of error does not appear at the first time step, and why it does not appear on all processors. Moreover, this bug sometimes occurs in the std::map related lines.

Additionally, on my macOS and Linux laptop, the code can be run successfully; but it does not work on the HPC.

Far too complex! Step 1: Both std::set and std::map have a clear function, so no need for swapping with empty temporaries:

/* const*/ Mesh& Mesh::operator=(Mesh const& other)
// why return const? 'this' isn't const either;
// if at all, you only prevent using it directly afterwards:
// Mesh x, y;
// (x = y).someNonConstFunction();
{
    //std::set<size_t>().swap(ghostSet);  ///BUG here
    //std::map<size_t, size_t>().swap(localIndex); /// BUG sometimes here

    localIndex.clear();
    for(auto const &it : other.localIndex)
        localIndex[it.first] = it.second;

    ghostSet.clear();
    for(auto const &it : other.ghostSet)
        ghostSet.insert(it);
}

Re-ordering the clearing above is just for better illustration of step 2: Both std::map and std::set already provide assignment operators that do exactly what clearing and the copy loop do:

Mesh& Mesh::operator=(Mesh const& other)
{
    //localIndex.clear();
    //for(auto const &it : other.localIndex)
    //    localIndex[it.first] = it.second;
    localIndex = other.localIndex;


    //ghostSet.clear();
    //for(auto const &it : other.ghostSet)
    //    ghostSet.insert(it);
    ghostSet = other.ghostSet;

    // now fixing as well:
    return *this;
}

Step 3: Now above operator does exactly what the default assignment operator does, solely the default does assignments in the order of members being declared, so would first assign the set, then the map. Assuming assignment order being irrelevant, you finally get to:

class Mesh
{
    Mesh& Mesh::operator=(Mesh const& other) = default;
};

I am working on a parallel code [...]

Be aware that in any case, assignments are not thread-safe (neither was your original code with loops). It is pretty likely that your double-deletion issue simply resulted from simultaneous access to either the set or map. You will have to protect your map from being accessed while the operator is yet active, eg via a mutex.

You have not two options: Make your class itself thread-safe by aquiring the mutex any time it is accessed (getters as well!), however, returning any contents by reference or pointer then gets unsafe as the lock won't be held any more as soon as the getter is exited. If you return by value anyway, no problem.

The other variant is leaving correct thread synchronisation to the user, which avoids above problems, as the would lock the mutex before getting a reference, hold the mutex as long as the reference is yet in use and only then release it.

Above approach could be improved with read/write locks, where the read write lock is only held if the object is modified (new items added to map or set or assignment as above). Critical is modifying single elements – one would need to hold the write lock as well unless the elements provide a mutex or similar on their own or can be modified atomically (or with some lock-free algorithm).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM