Meeting update

I am trying to debug a seg-fault on BH that is occuring when I run a 3D, AMR simulation of the MHD colliding flows module in the debug queue,

http://astrobear.pas.rochester.edu/trac/astrobear/ticket/326

I had a bit of trouble running with Valgrind, but figured out why (will add to the wiki page on valgrind). First, needed to load the valgrind module on bluehive, and 2nd, needed to modify my .pbs script from this call,

mpirun -np $NUMNODES valgrind -v --leak-check=full --leak-resolution=high --num-callers=40 --read-var-info=yes --track-origins=yes --log-file="valgrind_dump.txt" -machinefile nodefile.$PBS_JOBID ./$PROG > astrobear.log

to this call,

mpirun -np $NUMNODES valgrind -v --leak-check=full --leak-resolution=high --num-callers=40 --read-var-info=yes --track-origins=yes --log-file="valgrind_dump.txt" ./$PROG

Note, I removed the additional log files (machine log and astrobear log). I would like to have those printed as well, and hope by fidgeting with the command can get it to do so. Now, though, I am running with Valgrind, and will add the output to the ticket shortly.

Another problem I ran into when trying to debug the seg fault was that I was unable to get error logs when running with debug flags. Christina was also not getting traceback information. Jonathan said he suspected this might be due to the stack limit on Bluehive, and that I might

-create a launch.s script in your run directory that looks like this:

#!/bin/bash ulimit -s unlimited ./astrobear

-then make it executable by running

chmod u+x launch.s

-then modify your pbs script to run launch.s instead of astrobear - ie,

mpirun -n 64 launch.s

I am also trying this. Will update by meeting hopefully.

I am in communication with a CIRC tech, and am seeing if he can help me fix a bug in bluehive.log. When you run a sim on 1 node, there are 2 helpful variables that are reported: 1) total memory requested, and 2) memory used. However, when you run on multiple nodes, the 2nd value is reported as N/A. It would be nice to have both of these values however as an additional measure of memory usage. I also learned that you can only request 2 nodes in the debug queue, even though the CIRC wiki states you can request up to 16 processors in the queue. This is interesting given there are more than 2 nodes available for the debug queue.

Comments

No comments.