简体   繁体   中英

how can I understand my valgrind error message?

I am getting the follow error message from valgrind:

==1808== 0 bytes in 1 blocks are still reachable in loss record 1 of 1,734
==1808==    at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==1808==    by 0x4CC2BA9: hwloc_build_level_from_list (topology.c:1603)
==1808==    by 0x4CC2BA9: hwloc_connect_levels (topology.c:1774)
==1808==    by 0x4CC2F25: hwloc_discover (topology.c:2091)
==1808==    by 0x4CC2F25: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==1808==    by 0x4C60957: orte_odls_base_open (odls_base_open.c:205)
==1808==    by 0x632FDB3: ???
==1808==    by 0x4C3B6B9: orte_init (orte_init.c:127)
==1808==    by 0x403E0E: orterun (orterun.c:693)
==1808==    by 0x4035E3: main (main.c:13)
==1808==
==1808== 0 bytes in 1 blocks are still reachable in loss record 2 of 1,734
==1808==    at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==1808==    by 0x4CC2BD5: hwloc_build_level_from_list (topology.c:1603)
==1808==    by 0x4CC2BD5: hwloc_connect_levels (topology.c:1775)
==1808==    by 0x4CC2F25: hwloc_discover (topology.c:2091)
==1808==    by 0x4CC2F25: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==1808==    by 0x4C60957: orte_odls_base_open (odls_base_open.c:205)
==1808==    by 0x632FDB3: ???
==1808==    by 0x4C3B6B9: orte_init (orte_init.c:127)
==1808==    by 0x403E0E: orterun (orterun.c:693)
==1808==    by 0x4035E3: main (main.c:13)

I am not able to understand which kind of problem valgrind is reporting. Is there anybody willing to explain?

I have checked all new instances. All of them are properly deleted.

I am getting valgrind error messagges and a further error form mpi when the code ends:

---------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1811 on node laki.pi.ingv.it exited on signal 11 (Segmentation fault).
----------------------------------------------------------------------

Here is the error message regarding MPI_Init:

==31198== 0 bytes in 1 blocks are still reachable in loss record 1 of 368
==31198==    at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==31198==    by 0xC66DE49: hwloc_build_level_from_list (topology.c:1603)
==31198==    by 0xC66DE49: hwloc_connect_levels (topology.c:1774)
==31198==    by 0xC66E1C5: hwloc_discover (topology.c:2091)
==31198==    by 0xC66E1C5: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==31198==    by 0xC62B473: opal_hwloc_unpack (hwloc_base_dt.c:83)
==31198==    by 0xC6270AB: opal_dss_unpack_buffer (dss_unpack.c:120)
==31198==    by 0xC62815F: opal_dss_unpack (dss_unpack.c:84)
==31198==    by 0xC5F2349: orte_util_nidmap_init (nidmap.c:146)
==31198==    by 0xED98608: ???
==31198==    by 0xC5DC0B9: orte_init (orte_init.c:127)
==31198==    by 0xC59DBAE: ompi_mpi_init (ompi_mpi_init.c:357)
==31198==    by 0xC5B443F: PMPI_Init (pinit.c:84)
==31198==    by 0x55FA53: main (solver_2d.hpp:22)

where line solver_2d.hpp:22 consists exactly in:

MPI_Init(&argc, &argv);

Further, the error message related to MPI_Finalize(); is

==31198== 1 errors in context 1 of 58:
==31198== Syscall param write(buf) points to uninitialised byte(s)
==31198==    at 0x38EF00E6FD: ??? (in /lib64/libpthread-2.12.so)
==31198==    by 0x11F1F548: ???
==31198==    by 0x11F1E03F: ???
==31198==    by 0x11CD7FBA: ???
==31198==    by 0x11CE519A: ???
==31198==    by 0x11CE3C37: ???
==31198==    by 0x11CD90C1: ???
==31198==    by 0x11AC2E36: ???
==31198==    by 0xC59ECC4: ompi_mpi_finalize (ompi_mpi_finalize.c:285)
==31198==    by 0x562185: main (solver_2d.hpp:171)
==31198==  Address 0x1ffeffda24 is on thread 1's stack
==31198==  Uninitialised value was created by a stack allocation
==31198==    at 0x11CCE050: ???

and

==31197== Syscall param write(buf) points to uninitialised byte(s)
==31197==    at 0x38EF00E6FD: ??? (in /lib64/libpthread-2.12.so)
==31197==    by 0x11F1F548: ipath_cmd_write (in /usr/lib64/libinfinipath.so.4.0)
==31197==    by 0x11F1E03F: ipath_poll_type (in /usr/lib64/libinfinipath.so.4.0)
==31197==    by 0x11CD7FBA: psmi_context_interrupt_set (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197==    by 0x11CE519A: ips_ptl_rcvthread_fini (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197==    by 0x11CE3C37: ??? (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197==    by 0x11CD90C1: psm_ep_close (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197==    by 0x11AC2E36: ompi_mtl_psm_finalize (mtl_psm.c:200)
==31197==    by 0xC59ECC4: ompi_mpi_finalize (ompi_mpi_finalize.c:285)
==31197==    by 0x562185: main (solver_2d.hpp:171)
==31197==  Address 0x1ffeffda24 is on thread 1's stack
==31197==  in frame #2, created by ipath_poll_type (???:)
==31197==  Uninitialised value was created by a stack allocation
==31197==    at 0x11CCE050: ??? (in /usr/lib64/libpsm_infinipath.so.1.15)

where line solver_2d.hpp:171 corresponds to:

MPI_Finalize();

Finally, the error message corresponding to MPI_write, or, better, to MPI_File_open reads:

==31198== 48 bytes in 1 blocks are still reachable in loss record 104 of 368
==31198==    at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==31198==    by 0xC58C750: opal_obj_new (opal_object.h:469)
==31198==    by 0xC58C750: ompi_attr_set_c (attribute.c:761)
==31198==    by 0xC5AA0BE: PMPI_Attr_put (pattr_put.c:58)
==31198==    by 0x118501AB: ???
==31198==    by 0x11843159: ???
==31198==    by 0x1185657D: ???
==31198==    by 0xC5CEFB5: module_init (io_base_file_select.c:442)
==31198==    by 0xC5CEFB5: mca_io_base_file_select (io_base_file_select.c:214)
==31198==    by 0xC5977A5: ompi_file_open (file.c:128)
==31198==    by 0xC5C6557: PMPI_File_open (pfile_open.c:96)
==31198==    by 0x5638A1: p_fstream (p_fstream.hpp:86)

where line p_fstream.hpp:86 is:

MPI_File_open(MPI_COMM_WORLD, const_cast<char*>(fname.c_str()), flags, MPI_INFO_NULL, &mpi_file);

The valgrind message reports a memory leak in mpirun and you probably should not care much.

I assume you ran

valgrind mpirun a.out

but you really want to look for incorrect memory access/leaks in the MPI app itself. In that case, you should run

mpirun valgrind a.out

Note all outputs will be interleaved, and since you are using Open MPI, you can

mpirun --tag-output valgrind a.out

to have each task's output prefixed with its rank value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM