Estimated SUs for 3-D Mach Stems
Here are some stats for the 2-D runs. These were all run on bluestreak at a resolution of 80 cells/rclump. I meant to run them on 128 cores, but I accidentally ran on 128 nodes which is equivalent to 2048 cores. This means that I should have better efficiency for my 3-D runs, so these estimates should be upper limits.
gamma | d (rclump) | run time (s) | SUs (thousands of CPU hrs) |
---|---|---|---|
5/3 | 20 | 33916 | 19.3 |
5/3 | 15 | 31027 | 17.7 |
5/3 | 10 | 23635 | 13.4 |
5/3 | 05 | 22282 | 12.7 |
1.40 | 20 | 34698 | 19.7 |
1.40 | 15 | 31184 | 17.7 |
1.40 | 10 | 29120 | 16.6 |
1.40 | 05 | 22086 | 12.6 |
1.20 | 20 | 41328 | 23.5 |
1.20 | 15 | 37236 | 21.2 |
1.20 | 10 | 32170 | 18.3 |
1.20 | 05 | 23586 | 13.4 |
1.01 | 20 | 52246 | 29.7 |
1.01 | 15 | 52459 | 29.8 |
1.01 | 10 | 47344 | 26.9 |
1.01 | 05 | 28754 | 16.4 |
Zcool | 20 | 63.9 | |
Zcool | 15 | 61.2 | |
Zcool | 10 | 54.9 | |
Zcool | 05 | 35.6 | |
TOTAL | 524.55 |
If I used the same filling fractions and extended these numbers to 3-D, then that would be like calculating the estimated run time for simulating rods instead of spheres. I went ahead and did it this way anyways, because the grid in the z-direction for these problems is relatively small compared to the other dimensions. So these estimates should be an upper limit.
To get the estimate for the 3-D runs, we need to compare the expected number of cells updates per root step.
2Dupdates = N0*(1 + 8*ff(0) + 64*ff(0)*ff() + 512*ff(0)*ff(1)*ff(2) + …)
3Dupdates = N0*(1 + 16*ff(0) + 256*ff(0)*ff() + 4096*ff(0)*ff(1)*ff(2) + …)
These can be written more generally as sums:
where n is the level, N_0 is the total number of root level cells and ff(i-1) is the filling fraction of level i-1. Note that ff(-1) = 1. It is also important to keep in mind that 3-D cell updates are approximately twice as expensive as 2-D cell updates, so…
3DSUs/2DSUs = 2*(3Dupdates/2Dupdates)
After using the above equations, this is what I get for the 3-D runs:
gamma | d (rclump) | SUs (millions) |
---|---|---|
5/3 | 20 | 22.3 |
5/3 | 15 | 20.3 |
5/3 | 10 | 15.5 |
5/3 | 05 | 14.8 |
1.40 | 20 | 22.5 |
1.40 | 15 | 20.3 |
1.40 | 10 | 19.0 |
1.40 | 05 | 14.6 |
1.20 | 20 | 26.8 |
1.20 | 15 | 24.1 |
1.20 | 10 | 20.8 |
1.20 | 05 | 15.5 |
1.01 | 20 | 33.7 |
1.01 | 15 | 34.2 |
1.01 | 10 | 30.8 |
1.01 | 05 | 18.7 |
Zcool | 20 | 74.2 |
Zcool | 15 | 71.5 |
Zcool | 10 | 63.9 |
Zcool | 05 | 41.6 |
TOTAL | 605 |
Clearly this is not feasible. Even if I could account for better efficiency in 3-D and perhaps lower filling fractions, I don't think I will be able to do all of these runs at this resolution. If I decrease the resolution to 40 cells/rclump then my total drops to 54.78 million SUs. I can improve this further by decreasing my domain in the x-direction and imposing periodic boundary conditions. I have to alter the resolution slightly to do this, so this would be for 38.4 cells/rclump. For this set up, my total drops to 35.56 million SUs. To summarize:
Resolution (cells/rclump) | Periodic BCs? | Total SUs (millions) |
---|---|---|
80 | no | 605.0 |
76.8 | yes | 390.9 |
40 | no | 54.78 |
38.4 | yes | 35.56 |
It is important to remember that these are upper limits, and there are several factors that will bring these numbers down:
- The 3-D filling fractions are probably a bit lower than what I used.
- The efficiency should be better for the 3-D runs. The 2-D runs typically had efficiencies of 15%-60%. In a perfect scenario, my efficiency would be 100% which is a significant increase.
- These estimates are for BlueStreak. Kraken, for example, has processors that are approximately 1.625 times faster (2.6 GHz vs. 1.6 GHz).
If I do some very rough estimates to try to account for these things, I can get the total SUs down to 9.85 million. Perhaps it would be possible and reasonable to do a subset of the runs at this resolution on Kraken. Below, I computed the average SUs and run time required for a run with 2048 cores on BlueStreak and Kraken.
Machine | SUs (thousands) | Run Time (days) |
---|---|---|
BlueStreak | 800.2 | 16.3 |
Kraken | 492.4 | 10.02 |
With run times this long, we run into the issue of having to do restarts due to the limited wall time on these machines. This will bring the efficiency down and increase the SUs and run time required. There is also the extra time spent waiting in the queue which probably doubles the time it takes to get a run completed.
Comments
No comments.