Sims updates and memory finding on BH.log

My 643 + 5 was running okay, until I got some nans, then the memory exceeded that requested, and bluehive.log gave me an odd error log (check attached). run:/scratch/ekamins2/CollidingFlows/TEST_2013NOV21/TestStackScript/5levels_nans I can restart from the frame before the NaNs, and possibly use a debug version of the code to try for helpful messages in debugging this?

I am also re-running with 4 levels to see how that run does. run:/scratch/ekamins2/CollidingFlows/TEST_2013NOV21/TestStackScript/4levels

At least with 2 levels (completed in ~10 hrs), I do not think I saw these NaNs arising.. run:/scratch/ekamins2/CollidingFlows/TEST_2013NOV21/TestStackScript and files:/grassdata/erica/from_bluehive/CollidingFlows/3DMHDUnlStack

Also found that the memory used as reported in Bluehive.log ~1-2 GB higher than standard out is reporting for some low res test jobs, and that this memory report is different depending on whether I run on 1 or 2 nodes. Perhaps this mismatch between standard out and bluehive.log is due to some overhead in the inter-node, inter-core communication? run:/scratch/ekamins2/CollidingFlows/TEST_2013NOV21/TestBluehivelog

For the high res job (643 + 5), the mem used was reported as 80 GB > 40 GB total I requested, and also > 50 GB standard out is reporting I used. This indicates that perhaps there was an overhead of about 30 GBs… ? That is, maybe the simulation was using 50 GB, which already exceeded the memory request, but with the overhead, it was actually using 80 GB… and finally was killed by the queuing system

This makes it slightly unclear as to how much memory we should be requesting on the cluster if the memory standard out reports /= memory used by cluster..

Comments

1. Erica Kaminski -- 11 years ago

Document memory queries ?