SUs required for 3D pulsed jet runs
I took a look at the standard out from the 2.5D simulations that were completed on Stampede. Here are estimates for the total number of cell updates required at the end of the simulation and the number of SUs for the entire run:
Beta | Cell Updates | SUs |
---|---|---|
Hydro | 181139252 | 3145 |
5 | 144062238 | 5270 |
1 | 170121573 | 7638 |
0.4 | 164839549 | 11472 |
The total number of cell updates was easily obtained from the filling fractions in the standard out. The following equations were used:
N0 = nx*ny N1 = 4*N0*ff(0) N2 = 16*N1*ff(1) = 16*N0*ff(0)*ff(1) ... 2.5Dupdates = N0 + 2*N1 + 4*N2 + 8*N3 + ... = N0*(1 + 2*ff(0) + 4*ff(0)*ff() + 8*ff(0)*ff(1)*ff(2) + ...)
Now, going from 2.5D to 3D leads to many more computational cells. There is a factor of 2 from the fact that we can no longer impose symmetry and only simulate half of the jet. So to keep the same resolution the base grid goes from 42 x 420 x 1 to 84 x 420 x 84. That is 168x more cells on the base grid. Figuring out how many cells there are at higher levels is a bit trickier.
I analyzed the 2.5D data in Visit to get an estimate for what the filling fractions would be in 3D. I used the expression
r = coord(Mesh)[0]
Then, I made a pseudocolor plot, and queried the weighted variable sum for each AMR level separately. The ratio of these sums gave me the filling fractions for each level. Then, I used these equations to get the number of cell updates:
N0 = nx*ny*nz N1 = 8*N0*ff(0) N2 = 64*N1*ff(1) = 64*N0*ff(0)*ff(1) ... 3Dupdates = N0 + 2*N1 + 4*N2 + 8*N3 + ... = N0*(1 + 2*ff(0) + 4*ff(0)*ff() + 8*ff(0)*ff(1)*ff(2) + ...)
If I assume that the number of root steps remains constant then the ratio of (3Dupdates) / (2.5Dupdates) will tell me how much longer the 3D runs will take. However, there is an additional factor of 2 because Kraken processors are about half as fast as Stampede processors. Also, there is yet another factor of approximately 2 because 3D cell updates are more expensive than 2D cell updates. So to convert the SUs columns from table 1 to table 2 you would use:
3DSUs = 4*(3Dupdates)/(2.5Dupdates)*(2.5DSUs)
Below are my estimates for the number of cell updates, and how long the 3D runs should take on Kraken (with 10 cells per cooling length).
Beta | Cell Updates | SUs (millions) |
---|---|---|
Hydro | 1178305614616 | 81.8 |
5 | 956767405342 | 140 |
1 | 1132203286417 | 203 |
0.4 | 1361329586503 | 379 |
In total, this is about 804.1 million SUs on Kraken. Clearly, this is not going to work, so we have to lower the resolution. I redid the calculation for different resolutions. Below is a table of the total SUs for all 4 runs.
Base Grid | Levels | Cells/Lcool | SUs (millions) |
---|---|---|---|
84 x 420 x 84 | 7 | 10 | 804.1 |
84 x 420 x 84 | 6 | 5 | 140.4 |
84 x 420 x 84 | 5 | 2.5 | 24.5 |
21 x 105 x 21 | 7 | 2.5 | 24.5 |
21 x 105 x 21 | 6 | 1.25 | 4.03 |
21 x 105 x 21 | 5 | 0.625 | 0.576 |
So the only one that seems feasible is the last one. Now, one thing to keep in mind is that this Lcool is based on a cooling length of 4 AU which is what we think the strongest cooling regions within the jet should be. This corresponds to the high density and high velocity shocks (1000 cm-3 and 60 km/s). There are shocks that are stronger than this, especially at the head of the jet and near the injection region when pulses enter the grid. We may never be able to resolve these fully.
So even though a resolution might show 0.625 cells/Lcool, we are still fully resolving cooling lengths of 64 AU with 10 cells. This is adequate for most of the low density (100 cm-3) shocks, and even some of the low velocity shocks regardless of density (< 40 km/s). So this should be fine for the clumps that have already traveled some distance. At the last frame of the simulation, this looks like it'd be good enough for at least 3 of the 5 clumps, maybe 4.
In other words, at this resolution, we do not resolve the cooling length at the beginning of a clump's "lifetime", but we do end up resolving it later.
UPDATE
As was discussed in today's meeting (9/23), I redid all the calculations for frame 65 rather than frame 100. The ratio of (3Dupdates/2.5Dupdates) is different at different frames due to filling fractions being different. Furthermore, during a simulation, the filling fractions in 2.5D won't necessarily change in the same way that they do in 3D. The idea here is that the filling fractions around 2/3 of the simulation time are most characteristic of the time-averaged filling fractions.
I only redid the last table, since it is the most useful. The numbers improved, but it appears that these runs still require many SUs.
Base Grid | Levels | Cells/Lcool | SUs (millions) |
---|---|---|---|
84 x 420 x 84 | 7 | 10 | 645.0 |
84 x 420 x 84 | 6 | 5 | 116.6 |
84 x 420 x 84 | 5 | 2.5 | 19.79 |
21 x 105 x 21 | 7 | 2.5 | 19.79 |
21 x 105 x 21 | 6 | 1.25 | 3.028 |
21 x 105 x 21 | 5 | 0.625 | 0.424 |
Comments
Ideally youl could estimate how many cell updates are required… For a 2D (or 2.5D) run – this is just the number of cells on each level x 2level * number of root steps… In visit – you could probably query the number of zones on each level… - or use the filling fractions from the standard out…
N_1 = N_0*ff(0) N_2=N_1*ff(1)=N_0*ff(0)*ff(1) And so on…
And then cell updates = (N_0 + 2*N_1+4*N_2+8*N_3) * # root level steps
When going from 2.5 D to 3 D – you need to multiple each level by the effective number of cells in the 'z' direction you expect. This will be something like the jet radius/dx
So N_3 ⇒ N_3*R/dx(3)
And then the number of cell updates can be calculated as before… (N_0 + 2*N_1 + 4*N_2 + 8*N_3)*# root level steps
We can also write this as
(N_0+4*N_1+8*N_2+16*N_3)*# root level steps * R/dx(0)
Then don't forget the factor of 2 for the whole jet – and another factor of ~2 for 3D cell updates compared to 2D cell updates….
I imagine your efficiency will improve somewhat – as you will have more cells to distribute (assuming the same number of processors)… And the longer the walltime the more efficient the code will run…