Version 12 (modified by 12 years ago) ( diff ) | ,
---|
Debug Team Procedures
New Tickets
- When a new ticket arrives, the debug team leader will assign a debug team member (owner) to the ticket.
- The ticket owner should establish contact with the reporter and verify:
- That their environment settings, compiler flags, and submission scripts are correct.
- They are using the most current version of the code.
- The ticket owner will report back to the debug team leader, who will decide whether or not to change the ticket's priority. Most tickets will probably fall within the 2-4 range; only the debug team leader may set tickets as priority 1.
- The ticket owner should work with the reporter to reproduce the bug. This will involve access to some or all of the following:
- The reporter's data files.
- The reporter's AstroBEAR build.
- The cluster on which the error occurred.
- The PBS script used to submit the job.
- The reporter's
.bashrc
and.bash_profile
contents.
- Run the problem with debug flags turned on. If any pointer or memory allocation issues are found, then fix them first. If segfaults are other memory errors still occur without being caught by the compiler's error checking then
- Make absolutely sure the stack size is large enough (preferrably unlimited)
- Run the problem somewhere where you can monitor the memory usage (on bluehive or on bluegene with totalview) and check that is within the available resources
- Run a much lower resolution run for a fraction of the runtime but with all of the physics turned on using valgrind or totalview and fix any errors that appear.
- If no pointer/memory errors are found and the problem does not segfault but the problem persists, then try the following options:
- Lower the domain resolution.
- Turn off AMR (i.e., set
MaxLevel
inglobal.data
to 0). - Turn off source terms.
- Set
lGudonov
to true. - Set
InterpOpts
to 0 inphysics.data
. - Use a different problem module.
The goal here is to simplify the setup by turning off extra features until the error no longer manifests. Once the error goes away, then try switching back on things that had been turned off until you have the closest thing to the original setup that still works. The remaining deactivated features should provide further clues about the fault.
- Prepare a report (i.e., summary comment on the ticket) for weekly debug meetings. This comment should describe the results of the ticket owner's tests. If possible, the ticket owner should also recommend further courses of action.
- When the ticket reporter(user) can successfully run their original problem without the error the bug can be considered fixed. However, before closing the ticket the debug team member must check in any fixes to the code (after running the buildtests script) to the development repository and add any necessary documentation to help prevent this same bug from recurring.
Note:
See TracWiki
for help on using the wiki.