wiki:DeBugging

Version 5 (modified by Jonathan, 12 years ago) ( diff )

BackLinksMenu

Debugging


Important: When debugging, be sure to compile your code using our standard debugging flags:

-g -traceback -check bounds -check uninit -check pointers

The -g flag in particular is essential, as most of our debugging tools require the gdb data that this flag produces.


On Multiple Processors

To debug a multiprocessor run (say on 2 processors) you can run

mpiexec -np 2 xterm -e gdb mpibear

You should get a separate window for each processor. Go through each window and type

run

and press <Enter>. You should begin to see output from mpibear - with each processor running it's own debugger on top of the code.


Using Valgrind

In order to run a program under valgrind, the code must be compiled with the -g tag and without any optimization tags. The valgrind executable is available on grass and on bluehive, and can be made available by adding the path to the {{LD_LIBRARY_PATH}}} in your .bashrc file. Depending on your machine, use one of the following lines:

grass.pas.rochester.edu: export PATH=/home/bshroyer/local/valgrind/bin:$PATH


bluehive.rochester.edu: export PATH=$PATH:/home/bshroyer/local/valgrind/bin


To run a single-processor executable under valgrind, use the commmand

valgrind -v --leak-check=full --leak-resolution=high --num-callers=40 --read-var-info=yes --track-origins=yes --log-file="valgrind_dump.txt" ./<program_name>

This will run the program using valgrind and print the valgrind results to the file valgrind_dump.txt.

To use valgrind on a multiprocessor job, simply prepend the MPI call to the command:

mpiexec -np <np> valgrind -v --leak-check=full --leak-resolution=high --num-callers=40 --read-var-info=yes --track-origins=yes --log-file="valgrind_dump.txt" ./<program_name>

This will dump all processors' valgrind data to a single output file.


Interpreting valgrind files is a skill we're still trying to master. The valgrind website has a lot of information, but in general I find that the best things to pay attention to are the LEAK SUMMARY and the 'Invalid [read|write] of size' lines.

The LEAK SUMMARY tells you how much memory has been lost. In this context, 'lost' means that it can no longer be reached because the pointers that referenced it were nullified before the memory was freed. Memory leaks have been a major thorn in our side in the past, and are definitely worth tracking down.

Invalid reads or writes occur when the code attempts to access memory when it shouldn't, either because of a double allocation or because the code is trying to read an unallocated memory block (such as one created by an un-nullified pointer). These errors almost invariably lead to problems, and are good places to start.

Note: See TracWiki for help on using the wiki.