wiki:AstroBearAmr

Version 14 (modified by Jonathan, 12 years ago) ( diff )

AMR Explained

At first glance the AMR routine in amr_control.f90 is a little intimidating, but it can be understood by examining the most basic features first.


Round 1: A Simple, Single-Processor AMR Algorithm

First we'll start with a simplified version of the AMR algorithm. This version focuses only on the most essential features:

RECURSIVE SUBROUTINE AMR(n)
   INTEGER :: n, nSteps, step
   nSteps = 2
   CALL InitInfos(n)
   CALL ProlongateParentsData(n)
   DO step=1,nSteps
      levels(n)%step=step
      IF (step == 2) CALL UpdateOverlaps(n)
      CALL ApplyOverlaps(n,step)
      CALL ApplyPhysicalBCs(n)
      CALL SetErrFlags(n)
      IF (step == 2) CALL AgeNodesChildren(n)
      CALL BackupNodes(n+1)
      CALL CreateChildrens(n)
      IF (step == 1) THEN
         CALL InheritOldNodeOverlapsChildren(n)
         CALL InheritNewNodeOverlapsChildren(n)
      ELSE
         CALL InheritOverlapsOldChildren(n)
         CALL InheritOverlapsNewChildren(n)
      END IF
      CALL InheritNeighborsChildren(n)
      CALL AdvanceGrids(n)
      CALL AMR(n+1)
      CALL ApplyChildrenData(n)
      CALL SyncFluxes(n)
      CALL AccumulateFluxes(n)
      IF (step == 2) CALL NullifyNeighbors(n)
   END DO
   CALL CoarsenDataForParents(n)
END SUBROUTINE AMR

In this example, the parameter n represents the current level of the operation.

Remember as we step through the AMR() subroutine that it is recursive. At each step on level n, AMR(n) within the DO nSteps loop). calls itself on the next level up (see the calls to AMR(n+1) during each step on level n. This is because for each step on level n >= 0, there are two steps on level n+1, and the levels above the base level are regridded at each step.

IMPORTANT: Because AMR() is recursive, each call to AMR() at level n assumes that certain steps were carried out by the call at level -1. This can make the structure of AMR() a little confusing, especially once parallelization is included.

AMR(n) calls two subroutines before stepping through the simulation to finish initializing level n's data:

  • InitInfos(n) — Allocates grid data (InfoDef) structures for the grids on level n. Note that the tree structure and grid dimensions were created on the previous level n-1; InitInfos() just creates the data structures they reference.
  • ProlongateParentsData(n) — Populates the level n data structures with prolongated data from their parents on level n-1.

Once the data has been constructed, we can begin the work of advancing the simulation. Each level n will take two steps for every single step taken by n-1, but the process is slightly different on each step for levels above the base level.

Step 1

  • ApplyOverlaps(n, step) — After initializing level n with prolongated data from level n-1, AMR() copies over data from the previous generation of grids on level n. This higher-resolution data is preferable to data prolongated from the previous level, so AMR() uses it wherever it is available.
  • ApplyPhysicalBCs(n) — Apply physical boundary conditions to level n.
  • SetErrFlags(n) — Determine which regions to refine. Refinement regions are determined by the physical processes involved, as well as specific conditions imposed by the problem modules.
  • BackupNodes(n+1) — Caches the nodes on the child level n+1. We are about to create the new level n+1 nodes, and the data referenced by the backed-up nodes will be used when ApplyOverlaps(n+1) is called (see above).
  • CreateChildrens(n)— This routine creates child nodes on level n+1 using the refinement flags set on level n. The use of "Childrens" here is not a typo; CreateChildrens() is so named because it applies the CreateChildren(Info) subroutine to each grid on level n.
  • InheritOldNodeOverlapsChildren(n) — Nested grids mean that spatial relationships (overlaps and neighbors) are inherited from parent grids. This routine passes information about the previous generation of n+1 grids to the new grids created by CreateChildrens(n). This routine is only executed on step 1 of the AMR() execution loop.
  • InheritNewNodeOverlapsChildren(n) — The children of previous level n grids will also need to send their data to the children of new level n grids.
  • InheritNeighborsChildren(n) — The children of neighboring grids on level n will likely be neighbors on level n+1. This routine passes neighbor information from level n to level n+1.
  • AdvanceGrids(n) — Performs the hyperbolic advance step on the grids of level n. This is where our numerical solvers come in.
  • AMR(n+1) — Launches AMR routine on the child level
  • ApplyChildrenData(n) — The inverse of ProlongateParentData(), this routine restricts data from the child grids onto their parent grids, providing a more accurate solution on the coarser level.
  • SyncFluxes(n) — To enforce mass conservation and the DivB constraint, the fluxes at grid boundaries need to be synchronized.
  • AccumulateFluxes(n) — Accumulates the fluxes used on level n to send back to the parent grids on level n-1.

Step 2

  • UpdateOverlaps(n) — On the second step we don't need to receive overlap data from the previous generation of grids. We do, however, need to 'ghost' data with our current overlap grids or neighbors. So we treat our neighbors as our current overlaps; in effect, we use the same node list for neighbor operations and overlap operations. This is the reason why we use the NullifyNeighbors() routine later on instead of deleting the node list.
  • ApplyOverlaps(n, step) - Brings over ghost data from the neighbor grids, which are the current overlaps. This neighbor grid data has been advanced by the AdvanceGrids() subroutine, and thus is more accurate than the extrapolated data within the grid's ghost zones.
  • ApplyPhysicalBCs(n) — Apply physical boundary conditions to level n.
  • SetErrFlags(n) — Determine which regions to refine. Refinement regions are determined by the physical processes involved, as well as specific conditions imposed by the problem modules.
  • AgeNodesChildren(n) - Because of the nested grids giving us inherited relationships, we need to backup the relationships connecting us to the previous child grids on level n+1, as well as backing up the nodes themselves.
  • BackupNodes(n+1) — Caches the nodes on the child level n+1. We are about to create the new level n+1 nodes, and the data referenced by the backed-up nodes will be used when ApplyOverlaps(n+1) is called (see above).
  • CreateChildrens(n)— This routine creates child nodes on level n+1 using the refinement flags set on level n. The use of "Childrens" here is not a typo; CreateChildrens() is so named because it applies the CreateChildren(Info) subroutine to each grid on level n.
  • InheritOverlapsOldChildren(n) — Nested grids mean that spatial relationships (overlaps/neighbors) are inherited from parent grids. On the second step the previous generation of level n+1 grids are the old children of the current generation of level n grids.
  • InheritOverlapsNewChildren(n) — This inherits the relationships going the other way. The old children of level n grids will need to send their data to the new children of level n grids.
  • InheritNeighborsChildren(n) — The children of neighboring grids on level n will likely be neighbors on level n+1. This routine passes neighbor information from level n to level n+1.
  • AdvanceGrids(n) — Performs the hyperbolic advance step on the grids of level n. This is where our numerical solvers come in.
  • AMR(n+1) — Launches AMR routine on the child level
  • ApplyChildrenData(n) — The inverse of ProlongateParentData(), this routine restricts data from the child grids onto their parent grids, providing a more accurate solution on the coarser level.
  • SyncFluxes(n) — To enforce mass conservation and the DivB constraint, the fluxes at grid boundaries need to be synchronized.
  • AccumulateFluxes(n) — Accumulates the fluxes used on level n to send back to the parent grids on level n-1.
  • NullifyNeighbors(n) — On the second step, each node's neighbor list and overlap list is pointing to the same list (the node's neighbor list). This routine nullifies the neighbor list pointers without destroying the nodes they point to. In effect, this turns the current generation's neighbor lists into the next generation's overlap lists. On the next step, the BackUpNodes() routine will destroy the overlap nodes.
  • CoarsenDataForParent(n) — Now that both advance steps are complete for this level, it's time to coarsen the cell-centered data back down to the n-1 level grids.


Round 2: Refined Bookkeeping and Elliptic Solvers

Now that we have constructed a simple AMR routine, we will add a few routines to improve MHD calculations and simplify AMR tree management. We will also add sink particles and the elliptic solver step to the code, expanding the capabilities of our AMR algorithm.

RECURSIVE SUBROUTINE AMR(n)
   INTEGER :: n, nSteps, step
   nSteps = 2
   CALL InitInfos(n)
   CALL ProlongateParentsData(n)
   CALL ChildMaskOverlaps(n)
   DO step=1,nSteps
      levels(n)%step=step
      IF (step == 2) CALL UpdateOverlaps(n)
      CALL ApplyOverlaps(n,step)
      CALL ProlongationFixups(n)
      IF (lParticles) CALL ParticleUpdate(n)
      CALL ApplyPhysicalBCs(n)
      CALL SetErrFlags(n)
      IF (step == 2) CALL AgeNodesChildren(n)
      CALL BackupNodes(n+1)
      CALL CreateChildrens(n)
      IF (step == 1) THEN
         CALL InheritOldNodeOverlapsChildren(n)
         CALL InheritNewNodeOverlapsChildren(n)
      ELSE
         CALL InheritOverlapsOldChildren(n)
         CALL InheritOverlapsNewChildren(n)
      END IF
      CALL InheritNeighborsChildren(n)
      CALL AdvanceGrids(n)
      IF (lElliptic) CALL Elliptic(n)
      CALL PrintAdvance(n)
      CALL AMR(n+1)
      CALL ApplyChildrenData(n)
      CALL RestrictionFixups(n)
      CALL AfterFixups(n)
      CALL UpdateChildMasks(n)
      CALL SyncFluxes(n)
      CALL AccumulateFluxes(n)
      IF (step == 2) CALL NullifyNeighbors(n)
   END DO
   CALL CoarsenDataForParents(n)
END SUBROUTINE AMR
  • ParticleUpdate(n) — If there are sink particles then we update the particles here.
  • Elliptic(n) — If elliptic equations are being used, then the elliptic step is performed here.
  • PrintAdvance(n) — Just prints the 'Advancing level n …' line to the standard output (stdout). This routine has two collective communications in it, so it can be a bottleneck on a cluster with slow network connections between its nodes.
  • ProlongationFixups(n) — It is better to complete the prolongation of the aux fields after receiving overlaps. This guarantees that child grids have divergence-free fields consistent with both their neighbors and their parents.
  • ChildMaskOverlaps(n) — This sets the ChildMask array values to 0 for ghost cells that are refined by neighbors.
  • UpdateChildMask (n)— This sets the ChildMask array values to 1 in grid cells that are refined by the grid's own children. This also sets ChildMask to NEIGHBOR_CHILD in grid cells that are refined by a neighbor's children.
  • RestrictionFixups(n) — This updates cell-centered representations of aux fields after receiving restricted data from children.
  • AfterFixups(n) — This allows for user-defined routines to be applied after a grid has been fully updated. This is not to be confused with the AfterStep() routine, which is executed after the hyperbolic step is completed.


Round 3: The Maximum Level of Refinement

Up to this point we've assumed we are on an intermediate level of the AMR tree, with a level above us and a level below us. What is different if we are on the highest level MaxLevel?

RECURSIVE SUBROUTINE AMR(n)
   INTEGER :: n, nSteps, step
   nSteps = 2
   CALL InitInfos(n)
   CALL ProlongateParentsData(n)
   CALL ChildMaskOverlaps(n)
   DO step=1,nSteps
      levels(n)%step=step
      IF (step == 2) CALL UpdateOverlaps(n)
      CALL ApplyOverlaps(n,step)
      CALL ProlongationFixups(n)
      IF (lParticles) CALL ParticleUpdate(n)
      CALL ApplyPhysicalBCs(n)
      IF (n < MaxLevel) THEN
         CALL SetErrFlags(n)
         IF (step == 2 CALL AgeNodesChildren(n)
         CALL BackupNodes(n+1)
         CALL CreateChildrens(n)
         IF (step == 1) THEN
            CALL InheritOldNodeOverlapsChildren(n)
            CALL InheritNewNodeOverlapsChildren(n)
         ELSE
            CALL InheritOverlapsOldChildren(n)
            CALL InheritOverlapsNewChildren(n)
         END IF
         CALL InheritNeighborsChildren(n)
      END IF
      CALL AdvanceGrids(n)
      IF (lElliptic) CALL Elliptic(n)
      CALL PrintAdvance(n)
      IF (n < MaxLevel) CALL AMR(n+1)
      IF (n < MaxLevel) CALL ApplyChildrenData(n)
      CALL RestrictionFixups(n)
      CALL AfterFixups(n)
      IF (n < MaxLevel) CALL UpdateChildMasks(n)
      CALL SyncFluxes(n)
      CALL AccumulateFluxes(n)
      IF (step == 2) CALL NullifyNeighbors(n)
   END DO
   CALL CoarsenDataForParents(n)
END SUBROUTINE AMR

Nodes on MaxLevel will not have any children, and thus MaxLevel is the stopping case for our recursive AMR algorithm. AMR() should therefore not call itself on MaxLevel + 1. Similarly, none of the functions that only apply to creating, initializing, or responding to child nodes should be on MaxLevel. This means no error flags, no child grid creation, and no restriction operations.

In the code block above, we've applied the conditional

If (n < MaxLevel)

to the following routines that deal with child nodes. This prevents these routines from being executed on the highest allowable level of refinement:

  • SetErrFlags(n)
  • AgeNodesChildren(n)
  • BackupNodes(n)
  • CreateChildrens(n)
  • Inherit*Overlaps(n) (all the different permutations of the InheritOverlaps functionality).
  • AMR(n)
  • ApplyChildrenData(n)
  • UpdateChildMask(n)


Round 4: Behavior on the Lower Levels

At the other end of the AMR hierarchy is level 0, the base grid. This is the coarsest resolution in the problem, the source of the "root time step" against which the higher level timesteps are judged. The base grid has no parent grid, so it shouldn't execute any restriction or fixup routines that would map its data to a lower level.

Unfortunately, this is also where things get complicated. AstroBEAR doesn't have any grids below level 0, but the AMR tree structure in AstroBEAR does have levels below 0. This is because AstroBEAR needs to distribute processors as well as grids, and it is easier to do so with a root node below the base level. In addition, the problem domain might be split into multiple subdomains with different conditions, some of which may span more than one processor on the base level. Adding an additional level of nodes makes it easier for AstroBEAR to keep track of these regions.

RECURSIVE SUBROUTINE AMR(n)
   INTEGER :: n, nSteps, step
   IF (n <= 0) nSteps=1
   IF (n >  0) nSteps = 2
   IF (n > -2) THEN
      CALL InitInfos(n)
      CALL ProlongateParentsData(n)
      IF (n > -1) CALL ChildMaskOverlaps(n)
   END IF
   DO step=1,nSteps
      levels(n)%step=step
      IF (step == 2) CALL UpdateOverlaps(n)
      IF (n > -2) CALL ApplyOverlaps(n,step)
      IF (n > 0) CALL ProlongationFixups(n)
      IF (n > -1 .AND. lParticles) CALL ParticleUpdate(n)
      IF (n > -1) CALL ApplyPhysicalBCs(n)
      END IF
      IF (n < MaxLevel) THEN
         IF (n > -1) THEN
            CALL SetErrFlags(n)
         END IF
         IF (step == 2 .OR. n == -2) THEN
            CALL AgeNodesChildren(n)
         END IF
         CALL BackupNodes(n+1)
         CALL CreateChildrens(n)
         IF (n == -2) THEN
            CALL InheritOverlapsOldChildren(n)
            CALL InheritNeighborsChildren(n)
            CALL InheritOverlapsNewChildren(n)
         ELSE
            IF (step == 1) THEN
               CALL InheritOldNodeOverlapsChildren(n)
               CALL InheritNewNodeOverlapsChildren(n)
               CALL InheritNeighborsChildren(n)
            ELSE
               CALL InheritOverlapsOldChildren(n)
               CALL InheritNeighborsChildren(n)
               CALL InheritOverlapsNewChildren(n)
            END IF
         END IF
      END IF
      IF (n > -1) THEN
         CALL AdvanceGrids(n)
         IF (lElliptic) CALL Elliptic(n)
         CALL PrintAdvance(n)
      END IF
      IF (n < MaxLevel) CALL AMR(n+1)
      IF (n < MaxLevel) CALL ApplyChildrenData(n)
      IF (n > -1) THEN
         CALL RestrictionFixups(n)
         CALL AfterFixups(n)
      END IF
      IF (n > -1) THEN
         IF (n < MaxLevel) CALL UpdateChildMasks(n)
         CALL SyncFluxes(n)
      END IF
      IF (n > 0) CALL AccumulateFluxes(n)
      IF (step == 2) CALL NullifyNeighbors(n)
   END DO
   IF (n > -2) CALL CoarsenDataForParents(n)
END SUBROUTINE AMR

Levels 0 and below

Level 0, the base level, represents the lowest level of hydrodynamic data. The grids on this layer have no parent grids, and thus have no need to prolongate or restrict data to them. Consequently, the following subroutines do not need to be called at the base level or below:

  • ProlongationFixups(n)
  • AccumulateFluxes(n)

Were in not for costmap data, these levels would neither need to call

  • ProlongateParentsData(n)
  • CoarsenDataForParents(n)

Levels -1 and below

Level -1 is the domain management level. Each subdomain of the problem is associated with a single level -1 node (single-subdomain problems only have one node on this level). Level -1 nodes and below do not need to call any routines related solely to hydrodynamic variables. In addition to the routines above, this includes:

  • ParticleUpdate(n)
  • ApplyPhysicalBCs(n)
  • SetErrFlags(n)
  • AdvanceGrids(n)
  • Elliptic(n)
  • PrintAdvance(n)
  • RestrictionFixups(n)
  • AfterFixups(n)
  • SyncFluxes(n)

Additionally ,since the entire domain is refined at the root level, levels below 0 do not need to maintain the childmask array. Consequently, these levels do not need to call:

  • ChildMaskOverlaps(n)
  • UpdateChildMasks(n)

Level -2

Level -2 is the root level, which ties together all the different subdomains. There is only ever one node on level -2, and it is always on the processor with MPI rank 0 (i.e., the master processor).

The Level -2 node is both persistent and parentless. As such, level -2 never calls:

  • InitInfos(n)
  • ApplyOverlaps(n)
  • ProlongateParentsData(n)
  • CoarsenDataForParents(n)

Finally since the level 2 is persistent, it behaves like a higher level grid in between steps so it always calls Level -2 behaves like a higher-level grid in between steps. Since it is persistent, it always calls the following routines:

  • AgeNodesChildren(n)
  • InheritOverlapsOldChildren(n)
  • InheritOverlapsNewChildren(n)
  • InheritNeighborsNewChildren(n)


Round 5: Communication

At this point, the basic AMR algorithm has been assembled, and we turn our attention to the code used to run AstroBEAR on more than one processor. While much of this code is invisible at the AMR() level, there are a few high-level routines included to manage large groups of messages.

Communication in AstroBEAR comes in two varieties: data communication, which is concerned with passing actual grid data between processors, and tree communication, which passes AMR tree data.

   RECURSIVE SUBROUTINE AMR(n)
     USE TreeLevelComms
     USE DataLevelComms
     INTEGER :: n, nSteps, step
     INTEGER :: iErr


     IF (n <= 0) nSteps=1 
     IF (n >  0) nSteps = 2 
     IF (n > BaseLevel) THEN
        CALL CompRecvGridsFromParents(n)
        CALL SortNodes(n)
        CALL CompSendGridsToChildren(n-1)         
        IF (n > -1) THEN
           CALL InitInfos(n)
           IF (n > 0)  CALL PostRecvParentsData(n)
        END IF
        CALL PostRecvOverlapsNeighbors(n)         
        CALL PostRecvOldNodeOverlaps(n)
        IF (n > 0) THEN
           CALL CompRecvParentsData(n)         
           CALL ProlongateParentsData(n)     
        END IF
        CALL CompRecvOverlapsNeighbors(n)        
        IF (n > -1)  CALL ChildMaskOverlaps(n)
        CALL CompRecvOldNodeOverlaps(n)        
     END IF

     DO step=1,nSteps
        levels(n)%CurrentLevelStep=levels(n)%CurrentLevelStep+1 
        levels(n)%step=step

        IF (step == 2) CALL UpdateOverlaps(n)
        IF (n > -1) THEN
           CALL GetLevelLoad(n)         
           CALL PostRecvOverlaps(n)        
           CALL PostSendOverlaps(n)        
           CALL ApplyOverlaps(n,step) 
           CALL CompRecvOverlaps(n)                    
        END IF

        IF (n > 0)  CALL AfterOverlaps(n)
        IF (n > -1) THEN
           CALL ParticleUpdate(n)
           CALL ApplyPhysicalBCs(n)
# if defined HYPRE
        CALL ApplyEllipticBC(n)
# endif
        END IF
        IF (n < MaxLevel) THEN 
           IF (n > -1)  CALL SetErrFlags(n)
           IF (step == 2 .OR. n <= BaseLevel)  CALL AgeNodesChildren(n)
           CALL AgeNodes(n+1)
           CALL CreateChildrens(n) 
           CALL DistributeChildrens(n)
           CALL PostSendGridsToChildren(n)         
           CALL PostRecvGridsFromParents(n+1)
           IF (n > -1)  CALL PostSendChildrenData(n)
           CALL PostRecvNeighboringChildren(n)
           CALL PostSendNeighboringChildren(n)
           IF (step == 1 .AND. n > BaseLevel) THEN
              CALL PostRecvOverlappingChildrenFromOldNodes(n)  
              CALL PostRecvOverlappingChildrenFromNewNodes(n)
              CALL PostSendOverlappingChildrenToOldNodes(n)    
              CALL PostSendOverlappingChildrenToNewNodes(n)
              CALL InheritOldNodeOverlapsChildren(n)
              CALL InheritNewNodeOverlapsChildren(n) 
              CALL InheritNeighborsChildren(n)
              CALL CompRecvOverlappingChildrenFromOldNodes(n)   
              CALL CompRecvOverlappingChildrenFromNewNodes(n)   
              CALL PostSendOverlapsToOldNodesChildren(n)
              CALL CompRecvNeighboringChildren(n)   
           ELSE
              CALL InheritOverlapsOldChildren(n) 
              CALL InheritNeighborsChildren(n)
              CALL CompRecvNeighboringChildren(n)   
              CALL InheritOverlapsNewChildren(n) 
              CALL PostSendOverlapsToNodesOldChildren(n)        
           END IF
           CALL PostSendOverlapsNeighbors(n)         
           IF (n > -1)  CALL PostRecvChildrenData(n)         
        END IF
        IF (n > -1)  CALL BeforeGlobalStep(n)

        IF (n < MaxLevel) THEN

           IF (n > -1) THEN
              CALL ApplyChildrenData(n)
              CALL CompSendChildrenData(n)         
           END IF
           CALL CompSendNeighboringChildren(n)     
           IF (step == 1 .AND. n > BaseLevel) THEN
              CALL CompSendOverlappingChildrenToOldNodes(n)
              CALL CompSendOverlappingChildrenToNewNodes(n)
              CALL CompSendOverlapsToOldNodesChildren(n)        
           ELSE
              CALL CompSendOverlapsToNodesOldChildren(n)        
           END IF
           CALL CompSendOverlapsNeighbors(n)
           IF (n > -1) THEN
              CALL CompRecvChildrenData(n)         
              CALL CompSendParentsData(n+1)         
           END IF
        END IF
        IF (n > -1) THEN
           CALL RestrictionFixups(n) 
           CALL AfterFixups(n)
        END IF
        IF (n > -1) THEN
           CALL PostRecvFluxes(n)         
           CALL PostSendFluxes(n)      
           IF (iThreaded == 0 .AND. n > 0)  CALL WaitingAdvances(n)
           CALL PrintAdvance(n)            
#if defined HYPRE                                            
           IF (lElliptic)  CALL Elliptic(n)
#endif
           IF (n < MaxLevel)  CALL UpdateChildMasks(n)
           CALL SyncFluxes(n) 
           CALL CompRecvFluxes(n)  
        END IF
        IF (n > 0)  CALL AccumulateFluxes(n)
        IF (n > -1) THEN
           CALL CompSendOverlaps(n)         
           CALL CompSendFluxes(n)          
        END IF
        IF (step == 2)  CALL NullifyNeighbors(n)
        IF (RestartStep) EXIT
     END DO
     IF (n > 0) THEN
        CALL CoarsenDataForParents(n) 
        CALL PostSendParentsData(n)                    
     END IF

   END SUBROUTINE AMR


Data Communication

There are four basic data routines that involve sharing of data between grids:

  • ProlongateParentsData() — Parents to children (Inter-Level)
  • ApplyChildrenData() — Children to parents (Inter-Level)
  • ApplyOverlaps() — Old Grids to current grids (Intra-Level)
  • SyncFluxes() — Current Grids to current grids (Intra-Level)

This description of communication is from a data domain perspective, though, and does not take interprocess communications into account. In a parallel application, the routines above will pass data to other processors as well as moving it between local grids.

All point-to-point communications involve a send and a receive. AstroBEAR's point-to-point communications are all non-blocking, which adds another dimension: the post, which sets up the transmission, and the complete (or Comp), which blocks on a transmission operation until it is completed.

  CALL PostRecvOverlaps
  ...
  CALL PostSendOverlaps
  CALL ApplyOverlaps
  CALL CompRecvOverlaps
  ...
  CALL CompSendOverlaps

To ensure the best possible performance, we want to post the data sends as soon as the data is available and wait until we absolutely need the data before completing the receives. This keeps the processors busy crunching numbers and swapping data between local grids while they wait for data to arrive. Completion of sends and posting of receives is a less precise art, but in general early posts and delayed receives lead to better performance.


Tree Communication

There are five tree operations that require some communication between processors:

  • CreateChildren()
  • InheritNeighborsChildren()
  • InheritOldNodeOverlapsChildren()
  • InheritNewNodeOverlapsChildren()
  • InheritOverlapsOldChildren()
  • InheritOverlapsNewChildren()

As in the case with data operations, each of these requires four communication calls (PostSend, PostRecv, CompSend, CompRecv) in order to overlap the computation with communication. In all of these cases, it is the node's children that are being communicated.

IMPORTANT: AMR() is a recursive algorithm, and many of the communications are inter-level communications. Consequently, a send or receive might be posted on one level and completed on another, which can make the algorithm tricky to follow. If you are a new user stepping through the AMR algorithm for the first time, start your traversal at n = -2 and follow along as you add levels.


Round 6: Scheduling and Threading

Several attempts have been made to incorporate threads into AstroBEAR in order to achieve global load balancing. There are still formidable technical issues to overcome with threading, so for the time being AstroBEAR uses a "pseudo-threaded" scheduling approach that mimics threading through careful management of the advances across all levels.

The scheduling code introduces three new subroutines into AMR():

  • ScheduledAdvanceGrids(n) — Calculate the workload for level n.
  • WaitingAdvances(n) — Advances grids on level n, and if there is time advances coarser grids while waiting for the other level n advances to finish.
  • CompleteAdvanceGrids(n) — Finishes advancing grids on level n.

The main advantage of the scheduling approach is the lack of external libraries. Implementing threads in a Fortran-based code like AstroBEAR requires specialized libraries or wrappers for POSIX threads. These libraries are not available on all clusters, which would require us to set up additional libraries on any machine where we wanted to run AstroBEAR.

We are considering two possible approaches for including threads in AstroBEAR:

  • Use threads to make the advance step of each level independent. Higher-level threads will need higher priorities, since their data is required to finish the lower-level steps.
  • Assign all of a level's operations to a thread. This approach would be promising, but it requires threads to communicate with other threads on different processors. This is a risky proposition, as older version of MPI are not thread-safe.

Threads introduce three new subroutines into AMR():

  • ThreadsInit() — Initializes thread variables and the threading environment.
  • LaunchAdvanceThread(n) — Creates a new thread for level n.
  • JoinAdvanceThread(n) — Rejoins the level-n thread with the main program after it has finished its advance.

For more information on threading see the Scrambler Threading page.

The final iteration of the AMR algorithm (minus the various Timer() calls) looks like this:

   RECURSIVE SUBROUTINE AMR(n)
     USE TreeLevelComms
     USE DataLevelComms
     INTEGER :: n, nSteps, step
     INTEGER :: iErr


     IF (n <= 0) nSteps=1 
     IF (n >  0) nSteps = 2 
     IF (n > BaseLevel) THEN
        CALL CompRecvGridsFromParents(n)
        CALL SortNodes(n)
        CALL CompSendGridsToChildren(n-1)         
        IF (n > -1) THEN
           CALL InitInfos(n)
           IF (n > 0)  CALL PostRecvParentsData(n)
        END IF
        CALL PostRecvOverlapsNeighbors(n)         
        CALL PostRecvOldNodeOverlaps(n)
        IF (n > 0) THEN
           CALL CompRecvParentsData(n)         
           CALL ProlongateParentsData(n)     
        END IF
        CALL CompRecvOverlapsNeighbors(n)        
        IF (n > -1)  CALL ChildMaskOverlaps(n)
        CALL CompRecvOldNodeOverlaps(n)        
     END IF

     DO step=1,nSteps
        levels(n)%CurrentLevelStep=levels(n)%CurrentLevelStep+1 
        levels(n)%step=step

        IF (step == 2) CALL UpdateOverlaps(n)
        IF (n > -1) THEN
           CALL GetLevelLoad(n)         
           CALL PostRecvOverlaps(n)        
           CALL PostSendOverlaps(n)        
           CALL ApplyOverlaps(n,step) 
           CALL CompRecvOverlaps(n)                    
        END IF

        IF (n > 0)  CALL AfterOverlaps(n)
        IF (n > -1) THEN
           CALL ParticleUpdate(n)
           CALL ApplyPhysicalBCs(n)
# if defined HYPRE
        CALL ApplyEllipticBC(n)
# endif
        END IF
        IF (n < MaxLevel) THEN 
           IF (n > -1)  CALL SetErrFlags(n)
           IF (step == 2 .OR. n <= BaseLevel)  CALL AgeNodesChildren(n)
           CALL AgeNodes(n+1)
           CALL CreateChildrens(n) 
           CALL DistributeChildrens(n)
           CALL PostSendGridsToChildren(n)         
           CALL PostRecvGridsFromParents(n+1)
           IF (n > -1)  CALL PostSendChildrenData(n)
           CALL PostRecvNeighboringChildren(n)
           CALL PostSendNeighboringChildren(n)
           IF (step == 1 .AND. n > BaseLevel) THEN
              CALL PostRecvOverlappingChildrenFromOldNodes(n)  
              CALL PostRecvOverlappingChildrenFromNewNodes(n)
              CALL PostSendOverlappingChildrenToOldNodes(n)    
              CALL PostSendOverlappingChildrenToNewNodes(n)
              CALL InheritOldNodeOverlapsChildren(n)
              CALL InheritNewNodeOverlapsChildren(n) 
              CALL InheritNeighborsChildren(n)
              CALL CompRecvOverlappingChildrenFromOldNodes(n)   
              CALL CompRecvOverlappingChildrenFromNewNodes(n)   
              CALL PostSendOverlapsToOldNodesChildren(n)
              CALL CompRecvNeighboringChildren(n)   
           ELSE
              CALL InheritOverlapsOldChildren(n) 
              CALL InheritNeighborsChildren(n)
              CALL CompRecvNeighboringChildren(n)   
              CALL InheritOverlapsNewChildren(n) 
              CALL PostSendOverlapsToNodesOldChildren(n)        
           END IF
           CALL PostSendOverlapsNeighbors(n)         
           IF (n > -1)  CALL PostRecvChildrenData(n)         
        END IF
        IF (n > -1)  CALL BeforeGlobalStep(n)

        !-------------------------------- Threading options -------------------------------------

        !Option 1:  Just create threads to do the advancing on each level
        IF (iThreaded <= 0) THEN
           IF (n > -1 .AND. iThreaded == 0)  CALL ScheduledAdvanceGrids(n)
           IF (n < MaxLevel) CALL AMR(n+1)
           IF (n > -1) THEN
              IF (iThreaded == 0) THEN                 
                 CALL CompleteAdvanceGrids(n)
              ELSE
                 CALL AdvanceGrids(n)
              END IF
           END IF
#if defined PTHREADS
        ELSEIF (iThreaded > 0) THEN            
           IF (n > -1)  CALL LaunchAdvanceThread(n)
           IF (n < MaxLevel) CALL AMR(n+1)
           IF (n > -1)  CALL JoinAdvanceThread(n)
# endif
        END IF
        ! ---------------------------- End threading options --------------------------------------

        IF (n < MaxLevel) THEN

           IF (n > -1) THEN
              CALL ApplyChildrenData(n)
              CALL CompSendChildrenData(n)         
           END IF
           CALL CompSendNeighboringChildren(n)     
           IF (step == 1 .AND. n > BaseLevel) THEN
              CALL CompSendOverlappingChildrenToOldNodes(n)
              CALL CompSendOverlappingChildrenToNewNodes(n)
              CALL CompSendOverlapsToOldNodesChildren(n)        
           ELSE
              CALL CompSendOverlapsToNodesOldChildren(n)        
           END IF
           CALL CompSendOverlapsNeighbors(n)
           IF (n > -1) THEN
              CALL CompRecvChildrenData(n)         
              CALL CompSendParentsData(n+1)         
           END IF
        END IF
        IF (n > -1) THEN
           CALL RestrictionFixups(n) 
           CALL AfterFixups(n)
        END IF
        IF (n > -1) THEN
           CALL PostRecvFluxes(n)         
           CALL PostSendFluxes(n)      
           IF (iThreaded == 0 .AND. n > 0)  CALL WaitingAdvances(n)
           CALL PrintAdvance(n)            
#if defined HYPRE                                            
           IF (lElliptic)  CALL Elliptic(n)
#endif
           IF (n < MaxLevel)  CALL UpdateChildMasks(n)
           CALL SyncFluxes(n) 
           CALL CompRecvFluxes(n)  
        END IF
        IF (n > 0)  CALL AccumulateFluxes(n)
        IF (n > -1) THEN
           CALL CompSendOverlaps(n)         
           CALL CompSendFluxes(n)          
        END IF
        IF (step == 2)  CALL NullifyNeighbors(n)
        IF (RestartStep) EXIT
     END DO
     IF (n > 0) THEN
        CALL CoarsenDataForParents(n) 
        CALL PostSendParentsData(n)                    
     END IF

   END SUBROUTINE AMR
Note: See TracWiki for help on using the wiki.