wiki:Tutorials/JobSizes

Version 8 (modified by Erica Kaminski, 11 years ago) ( diff )

Allowable Job Sizes

Introduction

Job Size (RAM)

Understanding the feasibility of the desired size of a simulation is important. Doing so allows the user to avoid overly long simulations, avoid memory allocation errors, and/or perhaps allows her to rework the simulation so that it becomes feasible. Options there include reducing the size of the domain, reducing the number of tracer variables, or even casting the simulation into a different reference frame, such as a shock front rest frame.

It is more complicated to calculate the expected cost for an AMR simulation than fixed-grid, so we discuss fixed-grid first.

Wikipedia can introduce you to the concept of double precision, which is how AstroBEAR represents its state variables (density, momentum, energy).

Each double precision number is represented in 8 bytes. Thus, you can easily calculate how much memory Mtot a fixed-grid simulation requires by simply adding up all the numbers,

where Nauxvars is nonzero only with MHD, in 2D (5 extra variables) or 3D (12 extra). Nqvars represents the state variables—in hydro there are 4 in 2D & 2.5D and 5 in both 2.5D+angular momentum and 3D; in MHD there are always 8 plus 5 or 12 aux fields in 2D & 3D, respectively—as well as any tracer variables.

Luckily, AstroBEAR takes advantage of distributed memory, meaning that the more compute nodes which are allocated, the more memory it can access. In practice, this is the default thing to do if a simulation is very large.

Examples of RAM Requirements by Problem Size

Below I have bolded three typical sizes to aid the eye.

Number of variables 1 (1D advection) 4 (2D hydro, min.) 5 (3D hydro, min.) 8 (MHD, min.) 13 (2D MHD+aux) 20 (3D MHD+aux)
162 : 63 2 KB 8 KB 10 KB 16 KB 26 KB 40 KB
322 : 103 8 KB 32 KB 40 KB 64 KB 104 KB 160 KB
642 : 163 32 KB 128 KB 160 KB 256 KB 416 KB 640 KB
1282 : 253 128 KB 512 KB 640 KB 1024 KB 1.63 MB 2.5 MB
2562 : 403 512 KB 2 MB 2.5 MB 4 MB 6.5 MB 10 MB
5122 : 643 2 MB 8 MB 10 MB 16 MB 26 MB 40 MB
1,0242 : 1023 8 MB 32 MB 40 MB 64 MB 104 MB 160 MB
2,0482 : 1613 32 MB 128 MB 160 MB 256 MB 416 MB 640 MB
4,0962 : 2563 128 MB 512 MB 640 MB 1024 MB 1.63 GB 2.5 GB
8,1922 : 4063 512 MB 2 GB 2.5 GB 4 GB 6.5 GB 10 GB
16,3842 : 6453 2 GB 8 GB 10 GB 16 GB 26 GB 40 GB
32,7682 : 1,0243 8 GB 32 GB 40 GB 64 GB 104 GB 160 GB
65,5362 : 1,6253 32 GB 128 GB 160 GB 256 GB 416 GB 640 GB
131,0722 : 2,5803 128 GB 512 GB 640 GB 1024 GB 1.63 GB 2.5 GB
262,1442 : 4,0963 512 GB 2 GB 2.5 GB 4 GB 6.5 GB 10 GB
524,2882 : 6,5023 2 TB 8 GB 10 GB 16 GB 26 GB 40 GB
1,048,5762 : 10,3213 8 TB 32 GB 40 GB 64 GB 104 GB 160 GB
2,097,1522 : 16,3843 32 TB 128 GB 160 GB 256 GB 416 GB 640 GB
4,194,3042 : 26,0083 128 TB 512 GB 640 GB 1024 GB 1.63 PB 2.5 PB
8,388,6082 : 41,2853 512 TB 2 PB 2.5 PB 4 PB 6.5 PB 10 PB
16,777,2162 : 65,5363 2 PB 8 PB 10 PB 16 PB 26 PB 40 PB

Job Length (Time)

The length of time a simulation will take to run is based on its dynamics. In particular, when a user specifies an end time they are implicitly specifying a certain number of crossing times, or number of times that information could cross the domain.

In the simple case of hydro with no elliptic components, the crossing time is a function of the maximum acoustic wave speed vcs, max in the grid, where

where p is the pressure, gamma is the adiabatic index (ratio of specific heats), and rho the mass density.

This maximum speed—and its equivalents in MHD or with elliptic terms—determines the largest timestep that the code can take, bounded by the CFL number, C,

Thus, one can estimate the number of timesteps needed to reach the end of the simulation, via

meaning that the wall-clock time to finish the simulation is then approximately

where dTnow is the wall-clock time it took to finish the present timestep.


In practice, you're best off guessing

Unfortunately, in reality the maximum wave speed may vary wildly over the course of a simulation, meaning such approximations should be taken with a grain of salt.

In practice, you can quickly get an idea of how long your simulation will take by looking at the time between data output frames. In most cases, there will be many individual time steps in between data frames, meaning that short-term differences between the length of time steps will be smoothed out. This allows a fairly good indicator of whether the time between frames is expected to decrease, increase, or stay the same.

Maximum Job Sizes by Machine, Fixed-Grid

RAM is most often given in GB.

Machine cores/node RAM/node RAM/core (max.) RAM/core (min.)
Bluehive, general queue 8 16GB 16GB 2GB
Bluehive, Afrank queuea 8 12GB/8GB 12GB/8GB 1.5GB/1GB
Bluegeneb 64 128GB 2GB 0.5GB

Notes: a) As of 11/2010, there are 6 afrank nodes: 3 with 12GB RAM and 2 with 8GB RAM. b) bg/p has actually 2GB RAM quad-core nodes, but you can request nodes only in packs of 64, so you can treat those 64 individual nodes as a single node with 64*2GB=128GB RAM. You can, however, specify 1—4 of those cores per node, giving you 2GB RAM/core (1 core/node) or 0.5GB RAM/core (4 cores/node).

Maximum Job Sizes on Bluehive

While one could take the above table and compute theoretical upper limits, in practice there will be other requirements on RAM. The table below therefore includes a factor of 80% of n_cells to reflect this.

In practice, with AstroBEAR the limits are substantially less than even those given here.

Number of variables 1 (1D advection) 4 (2D hydro, min.) 5 (3D hydro, min.) 8 (MHD, min.) 13 (2D MHD+aux) 20 (3D MHD+aux)
1-node (8 proc) 11983 7543 7003 5993 5093 4413
4-node (32 proc) 19013 11983 11123 9513 8093 7003
8-node (64 proc) 23953 15093 14013 11983 10193 8823

Maximum Job Sizes on Bluegene/p

This table also includes the 80% factor.

Number of variables 1 (1D advection) 4 (2D hydro, min.) 5 (3D hydro, min.) 8 (MHD, min.) 13 (2D MHD+aux) 20 (3D MHD+aux)
1-"node" (64 nodes/256 procs) 23953 15093 14013 11983 10193 8823
2-"node" 30183 19013 17653 15093 12843 11123
8-"node" (half-machine) 47913 30183 28023 23953 20373 17653
16-"node" (all-machine) 60363 38023 35303 30183 25673 22243


Maximum Job Sizes by Machine, AMR

The case with AMR is complicated by the fact that multiple grids on different levels may represent the same phsyical space in the domain. One may not therefore know ahead of time exactly how many cells will exist in the simulation at a certain time.

(table coming soon?)


Maximum Job Length by Machine

See the CRC's (or your own institution's) webpages for the maximum allowable job lengths by machine. In cases where the simulation is expected to run longer than this, the user must of course ensure that at least one data frame can be written in this time.

Ideally, the user would set up data frames so that the last ones are created only shortly before the job is killed (so as to not waste computation time).

Note: See TracWiki for help on using the wiki.