Causally Consistent Reversible Debugger for MPI Applications
Writing programs for parallel computation is a process significantly more difficult than programming for sequential execution. Debugger tools are of use in multiple stages of software development including implementation, analysis and maintenance. Some sophisticated debuggers offer – in complement to generic debugging commands – reversible debugging commands, providing the ability to progress backwards in the program execution in some form. MPI (Message Passing Interface) is a widely used standard for developing parallel programs. In this thesis, the implementation of a causally debugger for MPI applications offering reversible debugging commands while being capable of maintaining causal consistency is presented. The debugger utilises a distributed independent checkpointing mechanism to record the execution of the MPI application and coordinated restore mechanism to support reversible debugging of the MPI application. To the best of the author’s knowledge, this is the first debugger for MPI implementing this kind of checkpointing mechanism to enable reversible debugging. The produced tool demonstrates the viability of this checkpoint-restore mechanism to enable reversible debugging for parallel computation.
Reverse debugging, MPI, distributed debugging, checkpointing, parallel programming, reversibility