简体   繁体   中英

using gdb backtrace to debug MPI code

Using gdb with backtrace gives following output,

[Thread debugging using libthread_db enabled]
[New Thread 0x2aaaaffd3700 (LWP 32109)]
[Thread 0x2aaaaffd3700 (LWP 32109) exited]
Detaching after fork from child process 32110.
Detaching after fork from child process 32111.
Detaching after fork from child process 32112.
Detaching after fork from child process 32113.
Detaching after fork from child process 32114.
Detaching after fork from child process 32115.
Detaching after fork from child process 32116.
Detaching after fork from child process 32117.
Detaching after fork from child process 32118.
Detaching after fork from child process 32119.
Detaching after fork from child process 32120.
Detaching after fork from child process 32121.
Detaching after fork from child process 32122.
Detaching after fork from child process 32123.
Detaching after fork from child process 32124.
Detaching after fork from child process 32125.
Detaching after fork from child process 32126.
Detaching after fork from child process 32127.
Detaching after fork from child process 32128.
Detaching after fork from child process 32129.
Detaching after fork from child process 32130.
Missing separate debuginfos, use: debuginfo-install     fftw-3.2.1-3.1.el6.x86_64 glibc-2.12-1.80.el6_3.5.x86_64 nss-pam-ldapd-0.7.5-14.el6_2.1.x86_64
Detaching after fork from child process 32131.
Detaching after fork from child process 32133.
Detaching after fork from child process 32134.
Detaching after fork from child process 32135.
Detaching after fork from child process 32136.
Detaching after fork from child process 32137.
Detaching after fork from child process 32138.
Detaching after fork from child process 32139.
Detaching after fork from child process 32140.
Detaching after fork from child process 32141.
Detaching after fork from child process 32142.
Detaching after fork from child process 32143.
Detaching after fork from child process 32144.

Program received signal SIGFPE, Arithmetic exception.

0x00000000004a3104 in phase::Mobility::Average ()
#0  0x00000000004a3104 in phase::Mobility::Average ()
#1  0x00000000004a3523 in phase::Mobility::Average(phase::Field&, phase::BoundaryConditions&) ()
#2  0x000000000046fcda in phase::Diffusion::CalculateMobility(phase::Field&, phase::Composition&, phase::BoundaryConditions&, phase::Mobility&) ()
#3  0x0000000000441a3e in MyParallelism<MyParallelBlock>::Run() ()
#4  0x00000000004436dc in main ()

What does order of the output functions indicate? should I be looking for the last function of the output? How can I further narrow down the line which has caused arithmetic exception?

EDIT Running with -g option gives,

Program received signal SIGFPE, Arithmetic exception.
0x00000000004a5fa4 in phase::Mobility::Average ()
#0  0x00000000004a5fa4 in phase::Mobility::Average ()
#1  0x00000000004a63c3 in phase::Mobility::Average(phase::Field&, phase::BoundaryConditions&) ()
#2  0x0000000000472fea in phase::Diffusion::Mobility(phase::Field&, phase::Composition&, phase::BoundaryConditions&, phase::Mobility&) ()
#3  0x000000000042686e in MyParallelBlock::DoTimestep (this=0x7c9368)
    at Parallelism.cpp:100
#4  0x00000000004450d9 in MyParallelism<MyParallelBlock>::Run (
    this=0x7fffffffd2f0) at Parallelism.cpp:164
#5  0x0000000000446ad3 in main (argc=1, argv=0x7fffffffdcd8)
    at Parallelism.cpp:242

but cause of arithmetic exception is not narrowed down. this added the information that exception is in the run loop (which was already known). I was expecting some more information within function phase::Mobility::Average () . What is significance of the numbers 0x0000000000446ad3, 0x00000000004450d9 etc? can i get some information out of these numbers?

The gdb stack trace shows the functions in the order they lie on the call-stack from top to bottom (whereas the stack grows from bottom to top).

If gdb catches an Arithmetic Exception or Segmentation fault , the function which incurred the error will be shown at position #0 in gdb's stack trace.

In order to get the file and line information of where the error occurred, recompile your program with debug symbols. Use the compilers -g flag to do so. Make sure to recompile at least those files in which the failing function (see #0 on the stack trace) is declared and implemented.

In your case, you'll have to recompile the file which implements the class/namespace phase::Mobility with the -g option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM