wiki:SuperGrids

Version 7 (modified by trac, 12 years ago) ( diff )

SuperGrids

Collecting all of the data on a given level on a given processor into a supergrid has several distinct advantages.

  • All redundant computation within a processor is no longer necessary.
  • Internal synchronization of overlaps and fluxes is also not necessary.
  • Routines that can handle supergrids can handle arbitrary geometries and so it is certainly possible to avoid performing coarse calculations that are being performed on a finer grid.

However, for now it may be sufficient to only have the hyperbolic advance be super-gridded/sweeped. This then allows for a pre-overlap sweep that can be 'threaded/scheduled' with receive data from parents and receive overlaps.

For each time step there is a collection of calculations that must be performed. Performing the calculations redundantly allows less frequent communication. If we assume that we will communicate the initial q and aux where needed and that we will perform all of the calculations necessary to update q and aux then we can define a set of calculations that must be performed locally. These calculations will involve different intermediate arrays (pressure, interface states, predictor fluxes, etc…) and different calculations depend on other previous calculations in a geometric way. For example if I want to compute the slope (dq) of cell i I will need to have the values of q at cell i-1 and i+1. We can perform these calculations in any particular order provided that we always calculate the dependent quantities first. We also however, need to keep the dependent quantities around until we are done needing them. For example, if we calculate fluxes and update cells on the left half of the grid, we will need to store the fluxes along the right edge since they will be needed to update the right half of the grid.

To determine if a given calculation needs to be buffered we can ask if any future calculations will require the same value. Of course checking each cell can be a bit cumbersome - so if instead we plan on updating an entire region simultaneously (using temporary buffers), we only need to compare the calculations done during the region update with the remaining calculations after the region update is completed.

So the basic algorithm is to split up the calculations into different subsets based on communication dependencies.

  • CanDoPreGhost - contains all of the calculations that can be done before receiving overlap data. This stage should be interrupted once the ghosting is complete.
  • MustDoPreSync - contains all of the calculations required to send fluxes to neighbors. This stage should begin as soon as ghosting is complete
  • MustDoPostSync - contains all of the final calculations that must happen after flux synchronization
  • MustDoSomeTime - contains all of the remaining calculations.

Before beginning any stage, buffers need to be calculated based on where the dependencies of remaining calculations overlap with scheduled calculations for that stage.

The algorithm would go as the following:

  • Receive Grids from Parents
  • PostRecvParentsData
  • PostRecvOverlapsNeighbors
  • PostRecvOldNodeOverlaps
  • if (n > BaseLevel) then
    • Queue up supergrid pre-overlap advance
    • CompRecvParentsData
    • ProlongateParentsData
    • CompRecvOverlapsNeighbors
    • ChildMaskOverlaps
    • CompRecvOldNodeOverlaps
  • end if
  • DO i=1,nSteps
    • PostRecvOverlaps
    • PostSendOverlaps
    • ApplyOverlaps
    • if (i == 1 and n > BaseLevel) then
      • CompRecvOverlaps
      • AfterOverlaps
    • end if
    • SetErrFlags
    • AgeNodesChildren
    • AgeNodes(n+1)
    • CreateChildren
    • PostSendGridsToChildren
    • PostRecvNeighboringOverlappingChildren
    • !PostSendNeighboringOVerlappingChildren
    • InheritNeighbors/OverlapsChildren
    • !ApplyPhysicalBC
    • if (i == 2 or n == BaseLevel)
      • Queue up supergrid pre-ghost advance
      • CompRecvOverlaps
      • AfterOverlaps
    • end if
    • PostSendChildrenData
    • Switch to post-ghost advance (and when it finishes automatically continue with final advance)
    • CompRecvNeighboringOverlappingChildren
    • PostSendChildrenNeighborsOverlaps
    • PostRecvChildrenData
    • PostRecvFluxes
    • CALL AMR(n+1)
    • CALL CompSends()
    • CALL CompRecvChildrenData() and buffer it in the supergridchildbuffer
    • Finish PostGhost Advance
    • PostSendFluxes
    • Queue up final Advance
    • CompRecvFluxes
    • Finish final Advance
    • Do Fixup Advance
    • AccumulateFluxes (!don't need to synchronize internal fluxes)
  • END DO
  • Send Stuff to parents
  • Instead of putting the comp send data to parents here - put it after the parents comp recv

Basically

  • Every CompRecv or CompSend routine needs to probe for a message and if their are no messages then call the supergrid advance.
  • CompRecvChildrenData and ApplyChildrenData need to store/replace the data in the sweep buffer.
  • The supergrid advance needs to support four different advance stages.
    • PreOverlap - Use existing old data that overlaps with new grids to begin advances
      • Try to avoid calculating data that will not be needed (skip regions where there were previously children)
      • Must cache all fluxes and emf's that are calculated since they may need to be differenced with unknown children (and do not calculate any final values - this will require holding on to beforesweep_ as well)
    • PreGhost - Do everything that can be done while buffering data that will be needed later
      • q_final
      • aux_final
      • fixupemf's and fixupfluxes
      • all other intermediate stencil pieces
      • Making sure to not calculate any quantity that will be received from child grids
    • PostGhost - Do everything that is needed to calculate fixupfluxes and emf's that will be synchronized with neighboring processors
      • not fixupfluxes that will be received from children
      • and buffer intermediate stencil data that will be needed by the final advance
    • FinalAdvance - Do everything else pulling fluxes and/or q from Child Data Buffers
    • FixupAdvance - Do final pass through data extracting/updating q_final, aux_final, fixupfluxes, and fixupemf's
  • In order to ensure data consistency without having to worry about 'corrections'
    • At no point should we use a flux that will be potentially modified
    • At no point should we calculate a quantity that will be replaced (q, flux, emf)
    • Child Data can go straight to the supergrid buffer
    • We do however, still need to synchronize fluxes/emfs with neighboring processors which is easier to do at the info level. This requires transferring data from the supergrid buffer to the info%fixupflux and info%emf - then synchronizing, and then afterward, updating the supergrid buffer.
  • We are also changing the way fixup emf's are handled. Now the aux fields will be directly coarsened as well as the fixupemf's. This has two advantages
    • No more restriction fixups
    • No need to perform additional computation (differencing of emf's).

Steps to move forward…

  • For a fixed grid computation we only need PreGhost, PostGhost, FinalAdvance, and FixupAdvance
    • PreGhost - Calculate internal q_new
    • PostGhost - Calculate external fixupfluxes/emfs
    • FinalAdvance - Calculate all of q_new that is possible
    • FixupAdvance - Update q_new and aux fields at edges with neighbors using received fluxes and emfs

Let A be the set of final calculations that are needed to update non-child regions

Let B be the set of calculations in A that need to be done before synchronization with neighboring external grids

Let D be the set of calculations that need to be done after synchronization and receipt of child values (Fixup Advance)

  • D * B = 0

Let E = A-D be the set of calculations that can be done before synchronization…

  • B * E = B
  • D * E = D * A - D * D = D - D = 0

Let F = The set of calculations that can be done with existing data

Let G = F * E be the preghost calculations

  • D * G = D * E * F = 0 * F = 0 (Fixup and preghost are disjoint)

Let G' be the completed preghost calculations

  • G' * G = G'
  • D * G' = 0 (Fixup and Completed preghost are disjoint)

Let H = B-(B*G') be the postghost calculations

  • H is in B so H * D < B * D = 0 (Fixup and post ghost are disjoint)
  • H * G' = B * G' - B * G' * G' = 0 (PostGhost and completed preghost are disjoint)

Let I = E - (H + G') = E - B - (G' - B*G') be the final calculations

  • I is in E so I * D < E * D = 0 (final and fixup are disjoint)
  • I * G' = 0 (final and preghost are disjoint)
  • I * H = 0 (final and postghost are disjoint)

So we have four mutually disjoint sets G', H, I, and D where G' + H + I + D = E + D = A

Now, just because we have four sets, does not mean we don't need to buffer common values. Any overlapping dependencies between these four sets of calculations should be buffered. Of course we cannot calculate H or I until after we've completed G'. However, let's say at any given instant we know G'. Since H and I will be a subset of E - G', we only need to buffer common dependencies between G' and E - G'. If we begin by assuming that G' = G and we calculate the dependencies between A and A - G, and we work our way across the grid keeping two buffers, one that is the intersection of A-G and G' and one that is the intersection of G-G' and G', then we will always have values buffered that our needed to finish G as well as A. Of course when we finish G', we need to combine the (G')/(G-G') buffer into the A-G' buffer so that the H will be able to use those values. We also need to calculate buffers for A-H and H as well as A-I and I, but those can be calculated once H and I are known respectively.

So…

  • Begin by calculating A, B, D, E, F, & G.
    • A can be calculated backwards from q3 and A3x, A3y, and A3z
    • B can be calculated backwards from f2x, f2y, f2z, e2x, e2y, and e2z where we only include fluxes that will not be supplied by children (ie that are in A)
    • D can be calculated by considering children and external neighbors
    • E is just A-D
    • G can be directly calculated from E by working forwards from old q and old aux (anding F)
  • H needs B and G' but we can discard (or turn G into G')
  • I needs E, H, G'
  • Calculate the overlaps between A-G and G and add them to the buffers.
  • Do G'.
  • Update the buffers with overlaps between G-G' and G'.
  • Calculate H and the overlapping dependencies between A-H and H. (Can discard G now)
  • Finish H and then calculate the dependencies between A-I and A. (Can discard H now)
  • Finish I (Can discard A and I now)
  • Finish D

Finally in AMR it gets even trickier since if you only have grids on a higher level - you could begin the advance as soon as you get your new grids. However, you may not know about your neighbors and children until sometime later… All you have is A* which includes child cells and F

If you start by assuming that the supergrid is surrounded by external neighbors, and that previously refined cells are still refined - you can estimate E*.

  • If you overestimate E, or underestimate A, or D, then you will not properly buffer common dependencies which could lead to "disastrous" or redundant computations. However, if you arrange the buffer so that the true buffer is a subset of the buffer - then these problems go away. However, to ensure the buffer contains the true buffer you need to assume that
  • D is everywhere (so you effectively must buffer all calculated fluxes and emfs)
  • A is everywhere (so assume there are no children)

It would be very nice to avoid doing calculations where there will likely be children, unfortunately, if we assume that there will be children somewhere - and there are not - then a lot of needed buffering will not happen. We could of course surround every potentially assumed child region with a worst-case buffer… So buffer as though there A was the entire thing - but only do calculations based on where children might be.

Essentially under-estimate G and overestimate A-G

  • So begin by assuming that A* has no children, but that D* is needed everywhere.
  • Calculate E* = A* - D*
  • Then reduce A* to A' by including a best guess for where children would be
  • Then calculate E' =A'-D*
  • And calculate G* = E' * F
  • Then calculate the dependency buffer for G* and A* which should include all of the dependency buffer between G* and A.
  • Then let G' be the part of G* that is actually done
  • Update the buffers between G' and G* as before
  • Now Calculate the true A, B, D, E
  • H = E - G'
  • I = E - H - G'
Note: See TracWiki for help on using the wiki.