Czarships/ExternalResources/RenewCollidingFlow: Scaling_Test_BL_04142014.tex

File Scaling_Test_BL_04142014.tex, 11.0 KB (added by Baowei Liu, 11 years ago)
Line 
1\documentclass{article}
2\usepackage{amssymb,amsmath}
3\usepackage{float}
4\usepackage{verbatim}
5\usepackage{graphicx}
6\usepackage{caption}
7\usepackage[utf8]{inputenc}
8\usepackage[T1]{fontenc}
9%\usepackage[square, comma, sort&compress]{natbib}
10\usepackage[square,sort,comma,numbers]{natbib}
11\usepackage{ctable}
12\usepackage{subfigure}
13
14\begin{document}
15
16\title{Code Performance of AstroBEAR2.0 on Stampede}
17\author{Adam Frank, Jonathan J. Carroll-Nellenback, Baowei Liu, Mart\'{i}n Huarte-Espinosa,\\ Jason Nordhaus, Shule Li, Erica L. Kaminski, Edward Hansen}
18
19\maketitle
20
21%Here we present strong scaling results for AstroBEAR. In Figure \ref{fig:diskKr}, we report scaling test results (with Disk module) on Kraken at NICS. Each compute node of Kraken has two six-core AMD Opterons, so we use $120$, $240$, $480$, $960$ and $1200$ processors. The resolution we used for these test are $128^{3} + 5$ level AMR which is same as the computation we are planning to do. The strong scaling test plot of the current code shows a slope $-0.848$ (Figure \ref{fig:diskKr}) while the slope for perfect scaling is $-1$. This shows AstroBEAR has an excellent scaling on Kraken. In Figure \ref{fig:jetSt}, we report scaling test results (with 3D Jet module) on Stampede at TACC. Each compute node of Stampede has two Intel E5 8-core (Sandy Bridge) processors and an Intel Xeon Phi 61-core (Knights Corner) coprocessor, so we use $128$, $256$, $512$, and $1024$ processors. The resolution we used for these test are $21\times105\times21 + 5$ level AMR which is same as the computation we are planning to do. The strong scaling test plot of the current code shows a slope $-0.101$ for Hydro jet and $-0.967$ for MHD jet (Figure \ref{fig:jetSt}) while the slope for perfect scaling is $-1$. This shows AstroBEAR has an excellent scaling on Stampede also.
22\section{Abstract}
23Strong and weak scaling behavior of the current AstroBEAR2.0 code were tested on Stampede of TACC. For the strong scaling test with the Colliding Flows module and the 3D Triggered Star Formation module,we got very excellent scaling result on up to 1024 cores on Stampede (with slope $-0.83$ and $-0.84$ respectively while the perfect scaling slope is $-1.0$). The behavior of the code with these two modules drops to (with slope $-0.68$ and $-0.69$ respectively) on 4096 cores. For the weak scaling with fixed grid, we found the running efficiency drops about $17\%$ up to 1024 cores on Stampede and $27\%$ on 4096 cores due to the communication increase between the processors. The size of our typical run job will be 1024 cores.
24
25\section{Colliding Flows Module}
26Figure (\ref{fig:CFstrong}) shows the strong scaling test result of AstroBEAR with the 3D Colliding Flows module on Stampede. The strong scaling tests are done with the exact same resolution as our proposed production runs ($40^{3}$ + 5 levels AMR but for much shorter final time (0.1\% of one frame while the production runs need 200 frames) and without IOs. The current code with the Colliding Flows modules shows excellent scaling on Stampede (with slope=$-0.83$ while the perfect scaling has a slope -1) up to 1024 cores. The behavior drops when running on more cores (with slope=$-0.68$).Our typical production runs are on 1024 cores and the SUs we request are based on the estimate time on 1024 cores.
27
28\begin{figure}[ht]
29\centering
30%\subfigure[Strong Scaling Test for AstroBEAR with Colliding Flow Module on Stampede]{
31\includegraphics[scale=0.9]{CFstrongOnStampede_f.eps}
32\caption{Current strong scaling behavior for AstroBEAR with 3D Colliding Flows module on Stampede. For these tests we run with the exact same resolution as our proposed production runs ($40^{3}$ + 5 levels AMR for Colliding Flows), but for much shorter final time (0.1\% of one frame while the production runs need 200 frames) and without IOs. The current code with the Colliding Flows modules shows excellent scaling on Stampede (with slope=$-0.83$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops with this particular module and production set up when running on more processors (with slope=$-0.68$).Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
33\label{fig:CFstrong}
34\end{figure}
35
36In Figure (\ref{fig:CFstrong}) we show the weak scaling test result of AstroBEAR with 3D Colliding Flows module on Stampede. In these tests we use the same resolution as the strong scaling test case but fixed grid. The number of cells on each core are keep as constant ($64^3$ per core). Weak scaling is good to test the data communication part of the code since the work load on each processor are about same. The runtime efficiency of AstroBEAR drops only 17\% on up to 1096 processors and 27\% up to 4096 processors which shows a very good weak scaling. Our typical production runs will be on 1024 cores.
37
38
39\begin{figure}[ht]
40\centering
41\includegraphics[scale=0.9]{CFweakOnStampede_f.eps}
42\label{fig:CFweak}
43\caption{Current weak scaling behavior for AstroBEAR with 3D Colliding Flows module. The test is done with fixed grid and same work load on each processor(each processor has $64^3$ cells). Weak scaling is good to test the data communication part of the code since the work load on each processor are about same. The runtime efficiency of AstroBEAR drops only 17\% on up to 1096 processors and 27\% up to 4096 processors which shows a very good weak scaling. Our typical production runs will be on 1024 cores. }
44\end{figure}
45
46\section{3D Triggered Star Formation}
47Figure (\ref{fig:TSFhydro}) shows the strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (3 levels AMR while the planed production runs has 4 levels). We think if we restart from a point where stars start to form (Figure \ref{fig:TSFstart}) we will have a better estimate about the running time. But We don't have a similar picture with 4 levels of AMR so we use 3 levels of AMR instead to do our scaling test. We estimate the running time with 4 levels of AMR will be 8 times longer than the runs with 3 levels of AMR, although this is highly depended on the fill fraction ratio which varies at different time. For these tests we run the code for 20\% of one frame (the production runs need 100 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede with slope=$-0.84$ (the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors which is consistent with the runs of the Colliding Flows module.
48
49\begin{figure}[ht]
50\centering
51\includegraphics[scale=0.9]{TSFstrongOnStampede.eps}
52\caption{Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (3 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 100 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
53\label{fig:TSFhydro}
54\end{figure}
55
56\begin{figure}[ht]
57\centering
58\includegraphics[width=\columnwidth]{TSF.png}
59\caption{The start point of strong scaling test for AstroBEAR with 3D Triggered Star Formation module. It shows a star formation is triggered at the center of the image. This start point has the same base dimension as our proposed runs and 3 levels of AMR (we propose to run at 4 levels of AMR).}
60\label{fig:TSFstart}
61\end{figure}
62%\begin{figure}[ht]
63%\includegraphics[scale=0.9]{TSFstrongOnStampede.eps}
64%\caption{Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (2 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
65%%\captionof{figure}{(a) Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (3 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors. (b) Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (2 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.
66%\label{fig:oldAlg}
67%\end{figure}
68\section{Summary}
69Strong and weak scaling behavior of the current AstroBEAR2.0 code were tested on Stampede of TACC. For the strong scaling test with the Colliding Flows module and the 3D Triggered Star Formation module,we got very good scaling result on up to 1024 cores on stampede. We expect the same behavior on other systems (for example, Gordon at SDSC) with similar architecture to Stampede.
70\end{document}