Version 18 (modified by 11 years ago) ( diff ) | ,
---|
Scaling on Bluestreak
Intro:
All tests done with Shear15, ~¼ way through simulation. I am assuming that the other runs will scale similarly. Astrobear was built with hypre library 2.9.0 without global partition.
Method:
In global.data, changed number of frames and restart frame by factor of 10. So final frame went from 200 to 2000, and restart frame was changed from chombo00050.hdf to chombo00500.hdf. Running this simulation then produces frames with dt = 1/10th of original frame. From this I can estimate the actual run time by multiplying the run time I get for a 1/10th dt frame by 10. This allows for faster scaling tests.
Results:
Nodes | Start time | Frame times | Average run time (not including first frame) | Avg. time to write to file | Frames/hr |
32 | 3:09 | 3:43, 4:15, 4:49 | 33 min/ .1 frame | 1.9 min | .19 |
128 | 3:09 | 3:29, 3:43, 3:57, 4:12 | 14.3 min/ .1 frame | 3.7 min | .55 |
256 | 3:14 | 3:29, 3:42, 3:55, 4:09 | 13.3 min/ .1 frame | 6.8 min | .83 |
512 | 5:17 | 5:34, 5:47, 6:00, 6.13 | 13 min/ .1 frame | 9.4 min | 1.3 |
In the table, I do not include the first frame when averaging the run time as it seems unusually slow given the additional time to reload the grids upon restart.
Note!!! As I increase the number of nodes, the time to write to file increases as well. Therefore, to get a truer estimate of the run-time for one frame, we need to be careful to remove the write-time before multiplying by 10, and only add it after (see next section).
Frames / hour calculation:
Start with calculating the framerate R. Since my data is given in minutes, R naturally has units of mins/frame. Remember to subtract off the write-time before multiplying by 10, since we are interested in the computation time only for a full time step. We can add the write-time back afterward:
So now we have a framerate in minutes/(full) frame. To convert to hours/frame, divide by 60, and to get to frames/hour, invert R.
32 Nodes Example
An example of how to do this for the 32 node case is as follows,
To get frames/hour, take the inverse R-1. Note — this assumes the write time will be the same at the end of the normal dt time step.
Similarly,
Note the scaling inefficiency — doubling the processors does not cut the run-time in half.
Choosing the best set of runs:
The question is, which combination gets me the most frames per hour? With 3 simulations on 512 nodes, the viable options are 1) 1 job with 512 nodes, 2) 2 jobs with 256 n each, 3) 2 jobs @ 128 n, and a 3rd @ 256. Adding up the total frames per hour for the different options shows that option 3) yields the most frames per hour.
Attachments (4)
- chart_1(1).png (8.4 KB ) - added by 10 years ago.
- chart_2(1).png (8.5 KB ) - added by 10 years ago.
- chart_3(1).png (8.4 KB ) - added by 10 years ago.
- chart_4(1).png (7.9 KB ) - added by 10 years ago.
Download all attachments as: .zip