wiki:BreakDown

Version 4 (modified by trac, 12 years ago) ( diff )

Analyzing Scrambler

Scrambler for the most part sequentially performs various tasks on the group of info structures on each level. By tracking when a processor begins a stage on a level and when it finishes that stage on that level, we can better analyze scramblers performance.

For instance if we plot the algorithm levelstage as a function of wall time for each of two processors in a run we get the following:

Graph showing execution flow for two processors

Note nearly all of the time is spent on stage 20 which is not surprisingly the advance stage. If we zoom in on the inbetween advance stage we get

Close up showing time spent between advances

If we look at the largest time sink it is in stage 57. Stage 57 is the backup stage and each processor is making copies of all of the data structures so that if something goes wrong it can restart. (Note this is about 1/200th of the advance time and is not too expensive given the savings. So we could eventually use a probability of an advance going bad to determine how many time steps we should go between backups. But this stage should not effect our scalability. The next biggest time spent by a processor on a non-advance stage is stage 25 on processor 1. For this problem stage 25 essentially involves calculating the cell centered B-fields from the aux fields - and should take the same amount of time on both processors. Why it doesn't is very puzzling. If we look at a later time step we find that this mystery goes away…

Same run but at a later time

Now we see that processor 0 finishes the advance later than processor 1 and that processor 1 has to wait in stage 23 for processor 1 to catch up. Stage 23 involves a collective communication so it is not surprising. While both processors are doing the same amount of work - noise in the system creates a .5% difference in execution time. The next largest time syncs are stages 42, 43, 5, and 52 which correspond to SendOverlaps, RecvOverlaps, ApplyOverlaps, and RecvFluxes. All of the Sends and Receives are called twice. Once for the posting and once for the receiving. The first receive is always the shortest since it only involves calling a non-blocking receive. The first send is also usually somewhat brief since it only involves packing the data before calling a non-blocking send. The second receive is usually the longest since it has to wait for messages to arrive as well as unpack and manipulate the data. Finally the second send should normally be brief since it is just waiting for the send (which was began earlier) to have completed

Attachments (3)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.