Debugging MPI Remotely Using GDB

Question

I am trying to debug code I wrote using MPI from a remote access group of pi's. I can not access the Pis directly in order to be able to use a GUI to debug the code.

I have tried what using screen like is shown in this question but anytime I try to use screen I get this message:

There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  screen

Either request fewer slots for your application, or make more slots available
for use.

If I try and tell it to just use 1 screen, mpiexec fails

mpiexec -N 16 --host 10.0.0.3 -np 1 screen -oversubscribe batSRTest3 shortpass.bat
--------------------------------------------------------------------------
mpiexec was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpiexec command
      line parameter option (remember that mpiexec interprets the first
      unrecognized command line token as the executable).

Node:       node1
Executable: screen

I have looked at the openMPI FAQ but the information does not apply for remote access. I tried following this part but when I type in

gdb --pid

with the code running nothing happens. Method 2 in that section also will not work as I cannot open multiple windows when accessing the PIs using Putty.

I want to be able to debug it when running on all of the nodes ideally, and currently to run my program I have to use:

$ mpiexec -N 4 --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 -oversubscribe batSRTest shortpass.bat

Which is also causing confusion as I'm not even sure I am adding in the extra arguments correctly.

I did try debugging using gdb similiar to the answer shared here but that just resulted in MPI failing since it wasn't given multiple tasks.

(gdb) exec-file batSRTest3
(gdb) run
Starting program: /home/pi/progs/batSRTest3 mpiexec -N 16 --host 10.0.0.3 -oversubscribe batSRTest3 shortpass.bat
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[Detaching after fork from child process 17157]
[New Thread 0x7691a460 (LWP 17162)]
[New Thread 0x75d3d460 (LWP 17163)]
[node1:17153] *** An error occurred in MPI_Group_incl
[node1:17153] *** reported by process [141361153,0]
[node1:17153] *** on communicator MPI_COMM_WORLD
[node1:17153] *** MPI_ERR_RANK: invalid rank
[node1:17153] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node1:17153] ***    and potentially your MPI job)
[Thread 0x7691a460 (LWP 17162) exited]
[Thread 0x76ff5010 (LWP 17153) exited]
[Inferior 1 (process 17153) exited with code 06]
(gdb) q

Answer 1

The problem with debugging MPI applications is that they run in the form of multiple process and often you do not have direct access to those processes. Therefore, special parallel debuggers exist that are able to integrate themselves inside the MPI job. The two most popular ones are TotalView and Arm DDT (formerly known as Allinea DDT). Both are expensive commercial products, but many academic institutions buy licenses, so check if that is the case with yours. The poor man's solution is to use GDB, which is not a parallel debugger per se, so one has to get creative.

In a nutshell, the idea is to launch your MPI processes under GDB's supervision. But first, let's look at how Open MPI executes a job on multiple nodes. The following diagram should illustrate it:

mpiexec <--+--> orted on node1 <--+--> rank 0
           |                      |
           |                      +--- rank 1
           |                      :
           |                      +--- rank N-1
           |
           +--- orted on node2 <--+--- rank N
           |                      |
           |                      +--- rank N+1
           |                      :
           :                      +--- rank 2N-1

mpiexec is the MPI program launcher, which is responsible for reading in information such as the number of MPI ranks, host lists, binding policy, etc., and using that information to launch the job. For processes on the same host as the one where mpiexec was executed, it simply spawns the executable a number of times. For processes on remote nodes, it uses RSH, SSH, or some other mechanism ( srun for SLURM, TM2, etc.) to start on each remote host the orted helper program, which then spawns as many ranks on its particular host as necessary.

Unlike regular Unix programs, you never interact directly with the MPI processes through the console or via Unix signals. Instead, the MPI runtime provides mechanisms for I/O forwarding and signal propagation. You interract with the standard input and output of mpiexec , which then uses some infrastructure to send your input to rank 0 and to show you the output received from all ranks. Similarly, signals sent to mpiexec are translated and propagated to the MPI ranks. Neither I/O redirection nor signal propagation are fully specified in the MPI standard, because they are very platform-specific, but the general cluster implementation consensus is that the standard output of all ranks gets forwarded to the standard output of mpiexec while only rank 0 receives from the standard input; the rest of the ranks have their standard input connected to /dev/null . This is shown with directed arrows on the diagram above. Actually, Open MPI allows you to select which rank will receive the standard input by passing --stdin rank to mpiexec .

If you do gdb mpiexec... , you are not debugging the MPI application. Instead, you will be debugging the MPI launcher itself, which isn't running your code. You need to interpose GDB between the MPI runtime and the MPI ranks themselves, ie, the above diagram should transform into:

mpiexec <--+--> orted on node1 <--+--> gdb <---> rank 0
           |                      |
           |                      +--- gdb <---> rank 1
           |                      :
           |                      +--- gdb <---> rank N-1
           |
           +--- orted on node2 <--+--- gdb <---> rank N
           |                      |
           |                      +--- gdb <---> rank N+1
           |                      :
           :                      +--- gdb <---> rank 2N-1

The problem now becomes how to interact with that multitude of GDB instances, mostly because you can directly talk to only one of them. With TotalView and DDT, there is a GUI that talks to the debugger components using network sockets, so this problem is solved. With many GDBs, you have a couple of options (or rather, hacks).

First option is to only debug a single misbehaving MPI rank. If the error always occurs in one and the same rank, you can have it run under the control of GDB while the rest run on their own and then use --stdin rank to tell mpiexec to let you interact with the debugger if the rank is not 0. You need a simple wrapper script (called debug_rank.sh ):

#!/bin/sh
# Usage: debug_rank.sh <rank to debug> <executable> <arguments>

DEBUG_RANK=$1
shift
if [ $OMPI_COMM_WORLD_RANK == $DEBUG_RANK ]; then
   exec gdb -ex=run --args $*
else
   exec $*
fi

The -ex=run tells GDB to automatically execute the run command after loading the executable. You may omit it if you need to set breakpoints first. Use the wrapper like this, for example to debug rank 3:

$ mpiexec ... --stdin 3 ./debug_rank.sh 3 batSRTest shortpass.bat

Once rank 3 does something bad or reaches a breakpoint, you'll be dropped into the GDB command prompt. You can also go without the wrapper script and run gdb directly, hoping that it won't drop into its command prompt on any other rank than the one you expect to be debugging. If that happens, GDB will exit because its standard input will be connected to /dev/null , bringing down the whole MPI job because mpiexec will notice one rank exiting without calling MPI_Finalize() .

If you don't know which particular rank misbehaves, or if it varies from run to run, or if you want to set breakpoints in more than one of them, then you need to get around the input redirection problem. And the "simplest" solution is to use X11 terminal emulators such as xterm . The trick here is that GUI programs get their input from the windowing system and not from the standard input, so you can happily type in and send input to commands running inside xterm despite its standard input being connected to /dev/null . Also, X11 is a client/server protocol that can run over TCP/IP, allowing you to run xterm remotely and have it displayed on your local system when running some X11 implementation such as X.org or XWayland. That's exactly what the command shown on the Open MPI page does:

$ mpiexec ... xterm -e gdb -ex=run --args batSRTest shortpass.bat

This starts many copies of xterm and each copy executes gdb -ex=run --args batSRTest shortpass.bat . So you get many instances of GDB in their own terminal windows, which allows you to interact with any and all of them. For this to work, you need a couple of things:

there should be a copy of xterm installed on each Pi
your network should be a low-latency one because the X11 protocol runs terribly slow on networks with longer delays
your X11 server should be reachable from all of the Pis and should be configured to accept connections from them
the DISPLAY environment variable should be set accordingly

Any X11 client application such as xterm uses the value in the DISPLAY environment variable to determine how to connect to the X11 server. Its value has the general form <optional hostname>:<display>[.<screen>] . For local servers managing a single display, DISPLAY is usually :0.0 or even just :0 . When <optional hostname> is missing, the special value host/unix is implied, which means that the X11 server is listening on a Unix domain socket located in /tmp/.X11-unix/ . By default, for security reasons X11 servers only listen on Unix domain sockets, which makes them unreachable for network clients. You need to enable listening on a TCP/IP socket and override the bind address, which is 127.0.0.1 by default, and to make sure your host is reachable from the Pis, ie, that they can directly connect to your IP address on the TCP port that the X11 server listens on. If you go this way, then it works like this:

Enable TCP connections for X11 and make it listen on a networked interface
Examine the value of DISPLAY on your system
Prepend your IP address
Run the MPI job like this:

$ mpiexec... -x DISPLAY=your.ip:ds xterm -e gdb -ex=run --args batSRTest shortpass.bat

where ds are the display and screen values that your local DISPLAY variable is set to. Make sure your firewall allows inbound TCP connections on port 6000+d .

Enabling TCP connections from the network is not always advisable or even possible, especially if you are behind NAT. Therefore, an alternative solution is to use X11 forwarding over SSH. For that, you need to pass -X or -Y to the SSH client when connecting to the SSH server:

 $ ssh -X username@server

-Y instead of -X enables some untrusted extensions and may be required for some X11 applications. X11 forwarding only works if enabled on the server side. It also needs that xauth is installed on the server. But simply enabling X11 forwarding on the server is not enough since by default the SSH server will listen on the loopback interface for X11 connections to forward. For OpenSSH, the following two configuration parameters must be set accordingly:

X11Forwarding yes    # Enable X11 forwarding
X11UseLocalhost no   # Listen on all network interfaces

If the SSH server is configured correctly and the xauth command is present, when you SSH into the system the value of DISPLAY should be something like hostname:10.0 and running netstat -an | grep 6010 netstat -an | grep 6010 should produce something like this:

tcp        0      0 0.0.0.0:6010            0.0.0.0:*               LISTEN
tcp6       0      0 :::6010                 :::*                    LISTEN

indicating that the X11 forwarding sockets are bound to all network interfaces. You should then launch the MPI job like this:

$ mpiexec -x DISPLAY=server.ip:10.0 xterm -e gdb -ex=run --args batSRTest shortpass.bat

where server.ip is the IP the server has in the network that connects it to the Pis (I suspect that would be 10.0.0.1 in your case). Also, a range of TCP ports starting with 6010 should be enabled in the server's firewall. The actual value depends on how many X11 forwarding sessions there are. By default, X11DisplayOffset is set to 10 , so the SSH server will start with display 10 and go up until an unallocated display number is found. Also, if your home directory on the Pis is not somehow shared with that on the server (eg, via NFS mounts), you also need to copy the .Xauthority file found in your home directory on the server to your home directory on all Pis. This file contains the MIT magic cookie needed to authenticate with the X11 forwarder and is regenerated each time you SSH into the server with X11 forwarding enabled, so make sure to copy it again to all Pis after each SSH login.

Now, if all this is seems overly complex, GDB also has remote debugging abilities. You can start GDB on the server and then run remote MPI processes under the supervision of the GDB server program gdbserver , then use the remote debugging commands in the local GDB to connect to one of the GDB servers. This is quite cumbersome. You need to tell each GDB server to listen on a different port. A wrapper script ( debug_server.sh ) may help:

#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

Run like this:

$ mpiexec ... ./debug_server.sh batSRTest shortpass.bat

It will print the list of hostnames and ports that the different GDB server instances are listening on. Fire GDB with no arguments and issue the following command:

(gdb) target remote hostname:port

where hostname and port are the IP (or hostname, if resolvable) of the Pi of interest. GDB server automatically breaks on the entry point of the executable, which will most likely be somewhere in the dynamic linker, and you need to issue the continue command to make it run. You need to do that for each GDB server instance and I don't know a way to disconnect from the current target without stoping it, so you may need to start a load of GDBs too.

There may be some GUI to GDB that simplifies this. You may look into the Eclipse PTP project, which provides a parallel debugger, and see whether it works for you. You may find these slides useful. I personally have never used PTP and have no idea what it can do.

That's basically why most MPI debugging is done using printf() except for the most convoluted cases. Add --tag-output to the list of mpiexec arguments to make it prefix each output line with the job ID and rank ID it comes from so you don't have to print that information yourself.

Answer 2

For a command line only experience, you can have a look at tmpi , which runs you MPI processes inside tmux panes (and multiplexes you keyboard input to each pane!)

Debugging MPI Remotely Using GDB

Question

2 answers

solution1
4 ACCPTED 2020-06-03 18:47:53

solution2
0 2021-10-28 16:56:12

Debugging MPI Remotely Using GDB

Question

2 answers

solution1 4 ACCPTED 2020-06-03 18:47:53

solution2 0 2021-10-28 16:56:12

solution1
4 ACCPTED 2020-06-03 18:47:53

solution2
0 2021-10-28 16:56:12