wiki:u/erica/memoryBaseline

Version 12 (modified by Erica Kaminski, 11 years ago) ( diff )

Running a 23 simulation of the colliding flows module with all the physics I will be using turned on, to get an idea of the baseline memory usage for this module.

I extended the time out to t=1000 to get a longer time on the processors. On bamboo, I run the job on 8 processors and give the command top. The virt column shows the memory usage to be 186 mb per process. Attaching the data files to this page. The build directory is /bamboodata/erica/CollidingFlows/scrambler_3.0.

To test the difference in memory usage with the different physics on/off, compiled astrobear with and without hypre. This is the only expected multiphysics to produce a change in memory usage, as it uses external libraries.

Self-gravity flag Machine Processors Memory per process (mb)
1 (on) Bamboo 8 186
0 (off) Bamboo 8 NA (bug in compiling with flag=0)
1 Bluehive, debug queue 8 196

(Note: all data files were same)

Results:

  1. Baseline memory can be different on different machines, probably not by much.
  2. Astrobear on disk (ls -lh astrobear) is 7 mb on bluehive, 6 on bamboo.
  3. This baseline memory > 20*astrobear on disk. Meaning, in addition to holding the executable in memory on each processor, there is an additional ~180mb being loaded on each processor. This could be due to shared libraries, mpi, etc.??
  4. Then in addition to total memory being used for info allocations (in astrobear.log), one should also add this "baseline" memory when trying to estimate the total memory the simulation is using. For example, if my baseline is 0.2 GB/processor, than the baseline = 1.6 GB per node with 8 processors. Now we can use that when trying to understand the total memory being used, as reported in bluehive.log at the end of a simulation, as the following tables show.

For total info allocations being reported in astrobear.log as ~ 200 mb (~.2 GB), on 1 node/8 processors, bluehive.log reports :

Resource Requested Used
memory (GB) 16.0 2.0

For total info allocations being reported as ~ 200 mb (~.2 GB), 2 nodes/16 processors:

Resource Requested Used
memory (GB) 16.0 3.7

For total info allocations being reported as 50 GB, 8 nodes/64 processors:

Resource Requested Used
memory (GB) 40 80

Now, to explain the difference between the number astrobear.log reports for the total info allocations, and the memory being reported as actually being used by the cluster in bluehive.log, we add the baseline memory to the info allocations. This gives, .2+1.6=1.8 GB, .2+2*1.6=3.4 GB, and 50+8*1.6=62.8 GB. The difference between the info+baseline and the actual used memory is largest for the simulation running on more nodes (62 GB compared to 80 GB). The difference might be due to redundant calculations (ghost cells) and parallelization??

Attachments (5)

Note: See TracWiki for help on using the wiki.