1 | | = Introducing AstroBEAR 2.0 = |
2 | | |
3 | | == AstroBEAR In Brief == |
4 | | |
5 | | The growing size of scientific simulations can no longer be accommodated simply by increasing the number of nodes in a cluster. Completing larger jobs without increasing the wall time requires a decrease in the workload per processor (i.e., increased parallelism). Unfortunately, increased parallelism often leads to increased communication time. Minimizing the cost of this communication requires efficient parallel algorithms to manage the distributed AMR structure and calculations. |
6 | | |
7 | | AstroBEAR's strength lies in its distributed tree structure. Many AMR codes replicate the entire AMR tree on each computational node. This approach incurs a heavy communication cost as the tree continuously broadcasts structural changes to all processors. AstroBEAR, on the other hand, only keeps as much tree information as the local grids need to communicate with processors containing nearby grid regions. This approach saves us memory usage as well as communication time, leaving us well-positioned to take advantage of massively parallel low-memory architectures such as [http://en.wikipedia.org/wiki/Blue_Gene BlueGene] systems and [http://www.nvidia.com/object/GPU_Computing.html GPUs]. |
8 | | |
9 | | In our distributed tree system, processors can have children and parents in much the same way that grids do. All of a processor's tree information--grids, overlaps, neighbors, and processors--comes from the parent processor. Thus, each processor only needs to communicate with its parent processor and the processors whose data directly interact with it. Grids are distributed among processors via a [http://en.wikipedia.org/wiki/Hilbert_curve Hilbert ordering], allowing AstroBEAR to take full advantage of the cluster's topology. |
10 | | |
11 | | Under this new system, a processor has four classes of processors with which it interacts: |
12 | | * ''Parent'': The processor associated with parent grids of the local processor's grids. |
13 | | * ''Child'': A processor associated with the children of the local processor's grids. |
14 | | * ''Neighbor'': A processor associated with one or more grids adjacent to the local processor's grids. |
15 | | * ''Overlap'': As an AMR simulation evolves, grids are constantly being created and destroyed as the refined regions change. An overlap processor is associated with grids from the previous step that overlap the local processor's current grids. |
16 | | |
17 | | Workload is managed on AstroBEAR through careful work scheduling and distributed load balance calculations. AstroBEAR calculates a given level's workload for each processor based on the total workload for the level and the processor's remaining workload from the previous level. This approach allows us to balance the workload globally rather than level by level. As a result, AstroBEAR can handle simulations with extremely coarse base grids and many levels of AMR. |
18 | | |
19 | | {{{ |
20 | | #!comment |
21 | | Additionally Scrambler uses a distributed control structure that mirrors the nested AMR grid hierarchy. Processors have child processors just as AMR grids have child grids. These child processors receive new grids, their own new child processors, and all necessary tree information from their parent. This eliminates the need for ANY global communication. Processors only need to communicate with parent processors(processors containing parent grids), neighbor processors (processors containing adjacent grids), overlapping processors (processors containing previous AMR grids that overlap with the processors current grids), and child processors (processors assigned to child grids). This does present a challenge for load balancing – since the total current workload for any given level can only be determined through collective communications after every grid has created children. However, since regions of refinement in AMR simulations typically change slowly – it is possible to use the previous regions of refinement to predict the future regions of refinement and the amount of resources that should be allocated for any given region. This allows the distribution to be parallelized as well. Additionally the allocation of resources among child grids is done using a Hilbert space filling curve. This allows neighboring processors to be physically close on the network (or on the same core) and allows Scrambler to take advantage of the network topology. |
22 | | |
23 | | In addition to being completely parallelized between processors, Scrambler makes use of threading to allow for parallel advancing of multiple AMR levels. This allows for total load balancing instead of balancing each level of the AMR hierarchy. This becomes especially important for deep simulations (simulations with low filling fractions but many levels of AMR) as opposed to shallow simulations (high filling fractions and only a few levels of AMR). Processors with coarse grids can advance their grids simultaneously while processors with finer grids advance theirs. Without this capability, base grids would need to be large enough to be able to be distributed across all of the processors. For simulations with large base grids to be able to finish in a reasonable wall time, only a few levels of AMR can be used. With threading this restriction is lifted. |
24 | | [[Image(threading2.png)]] |
25 | | }}} |
26 | | |
27 | | === Walk Throughs === |
28 | | * [AstroBearAmr Adaptive Mesh Refinement implementation in AstroBEAR] |
29 | | * [DistributedTree How Scrambler manages the distributed tree] |
30 | | * [DistributedControl How Scrambler manages the distributed control] |
31 | | * [ScramblerComms How Scrambler schedules and performs communication] |
32 | | * [LoadBalancing How Scrambler balances the work load] |
33 | | * [ScramblerThreading How Scrambler handles threading] |
34 | | * [ScramblerScheduling How Scrambler handles Scheduling for PseudoThreading] |
35 | | * [OptimizingScrambler Performance issues related to Scrambler] |
36 | | * [BreakDown How processors spend their time] |
37 | | * [ShellScheme Implementation of shell algorithm] |
38 | | * [ScanScheme Implementation of scan algorithm] |
39 | | * [GhostCells Managing ghost cells] |
40 | | * [ComputationalStencils Stencils] |
41 | | * SuperGrids |
| 1 | This page content has been moved |