Estimated SUs for 3-D Mach Stems

Here are some stats for the 2-D runs. These were all run on bluestreak at a resolution of 80 cells/rclump. I meant to run them on 128 cores, but I accidentally ran on 128 nodes which is equivalent to 2048 cores. This means that I should have better efficiency for my 3-D runs, so these estimates should be upper limits.

gamma d (rclump) run time (s) SUs (thousands of CPU hrs)
5/3 20 33916 19.3
5/3 15 31027 17.7
5/3 10 23635 13.4
5/3 05 22282 12.7
1.40 20 34698 19.7
1.40 15 31184 17.7
1.40 10 29120 16.6
1.40 05 22086 12.6
1.20 20 41328 23.5
1.20 15 37236 21.2
1.20 10 32170 18.3
1.20 05 23586 13.4
1.01 20 52246 29.7
1.01 15 52459 29.8
1.01 10 47344 26.9
1.01 05 28754 16.4
Zcool 20 63.9
Zcool 15 61.2
Zcool 10 54.9
Zcool 05 35.6
TOTAL 524.55

If I used the same filling fractions and extended these numbers to 3-D, then that would be like calculating the estimated run time for simulating rods instead of spheres. I went ahead and did it this way anyways, because the grid in the z-direction for these problems is relatively small compared to the other dimensions. So these estimates should be an upper limit.

To get the estimate for the 3-D runs, we need to compare the expected number of cells updates per root step.

2Dupdates = N0*(1 + 8*ff(0) + 64*ff(0)*ff() + 512*ff(0)*ff(1)*ff(2) + …)

3Dupdates = N0*(1 + 16*ff(0) + 256*ff(0)*ff() + 4096*ff(0)*ff(1)*ff(2) + …)

These can be written more generally as sums:

where n is the level, N_0 is the total number of root level cells and ff(i-1) is the filling fraction of level i-1. Note that ff(-1) = 1. It is also important to keep in mind that 3-D cell updates are approximately twice as expensive as 2-D cell updates, so…

3DSUs/2DSUs = 2*(3Dupdates/2Dupdates)

After using the above equations, this is what I get for the 3-D runs:

gamma d (rclump) SUs (millions)
5/3 20 22.3
5/3 15 20.3
5/3 10 15.5
5/3 05 14.8
1.40 20 22.5
1.40 15 20.3
1.40 10 19.0
1.40 05 14.6
1.20 20 26.8
1.20 15 24.1
1.20 10 20.8
1.20 05 15.5
1.01 20 33.7
1.01 15 34.2
1.01 10 30.8
1.01 05 18.7
Zcool 20 74.2
Zcool 15 71.5
Zcool 10 63.9
Zcool 05 41.6
TOTAL 605

Clearly this is not feasible. Even if I could account for better efficiency in 3-D and perhaps lower filling fractions, I don't think I will be able to do all of these runs at this resolution. If I decrease the resolution to 40 cells/rclump then my total drops to 54.78 million SUs. I can improve this further by decreasing my domain in the x-direction and imposing periodic boundary conditions. I have to alter the resolution slightly to do this, so this would be for 38.4 cells/rclump. For this set up, my total drops to 35.56 million SUs. To summarize:

Resolution (cells/rclump) Periodic BCs? Total SUs (millions)
80 no 605.0
76.8 yes 390.9
40 no 54.78
38.4 yes 35.56

It is important to remember that these are upper limits, and there are several factors that will bring these numbers down:

  • The 3-D filling fractions are probably a bit lower than what I used.
  • The efficiency should be better for the 3-D runs. The 2-D runs typically had efficiencies of 15%-60%. In a perfect scenario, my efficiency would be 100% which is a significant increase.
  • These estimates are for BlueStreak. Kraken, for example, has processors that are approximately 1.625 times faster (2.6 GHz vs. 1.6 GHz).

If I do some very rough estimates to try to account for these things, I can get the total SUs down to 9.85 million. Perhaps it would be possible and reasonable to do a subset of the runs at this resolution on Kraken. Below, I computed the average SUs and run time required for a run with 2048 cores on BlueStreak and Kraken.

Machine SUs (thousands) Run Time (days)
BlueStreak 800.2 16.3
Kraken 492.4 10.02

With run times this long, we run into the issue of having to do restarts due to the limited wall time on these machines. This will bring the efficiency down and increase the SUs and run time required. There is also the extra time spent waiting in the queue which probably doubles the time it takes to get a run completed.

Comments

No comments.