Czarships/ExternalResources/RenewCollidingFlow: Scaling_Test_BL_04102014.tex

File Scaling_Test_BL_04102014.tex, 7.4 KB (added by Baowei Liu, 11 years ago)
Line 
1\documentclass{article}
2\usepackage{amssymb,amsmath}
3\usepackage{verbatim}
4\usepackage{graphicx}
5\usepackage{caption}
6\usepackage[utf8]{inputenc}
7\usepackage[T1]{fontenc}
8%\usepackage[square, comma, sort&compress]{natbib}
9\usepackage[square,sort,comma,numbers]{natbib}
10\usepackage{ctable}
11\usepackage{subfigure}
12
13\begin{document}
14
15\title{Code Performance of AstroBEAR2.0 on Stampede}
16\author{Adam Frank, Jonathan J. Carroll-Nellenback, Baowei Liu, Mart\'{i}n Huarte-Espinosa,\\ Shule Li, Erica L. Kaminski, Edward Hansen}
17
18\maketitle
19
20--- Will update by Baowei
21
22%Here we present strong scaling results for AstroBEAR. In Figure \ref{fig:diskKr}, we report scaling test results (with Disk module) on Kraken at NICS. Each compute node of Kraken has two six-core AMD Opterons, so we use $120$, $240$, $480$, $960$ and $1200$ processors. The resolution we used for these test are $128^{3} + 5$ level AMR which is same as the computation we are planning to do. The strong scaling test plot of the current code shows a slope $-0.848$ (Figure \ref{fig:diskKr}) while the slope for perfect scaling is $-1$. This shows AstroBEAR has an excellent scaling on Kraken. In Figure \ref{fig:jetSt}, we report scaling test results (with 3D Jet module) on Stampede at TACC. Each compute node of Stampede has two Intel E5 8-core (Sandy Bridge) processors and an Intel Xeon Phi 61-core (Knights Corner) coprocessor, so we use $128$, $256$, $512$, and $1024$ processors. The resolution we used for these test are $21\times105\times21 + 5$ level AMR which is same as the computation we are planning to do. The strong scaling test plot of the current code shows a slope $-0.101$ for Hydro jet and $-0.967$ for MHD jet (Figure \ref{fig:jetSt}) while the slope for perfect scaling is $-1$. This shows AstroBEAR has an excellent scaling on Stampede also.
23
24Here we present the strong (\ref{fig:CFstrong}) and weak scaling (\ref{fig:oldAlg}) results for AstrobEAR with 3D Colliding Flows on Stampede. The strong scaling tests are done with the exact same resolution as our proposed production runs ($40^{3}$ + 5 levels AMR but for much shorter final time (0.5\% of one frame while the production runs need 200 frames) and without IOs.
25\begin{figure}[ht]
26\centering
27%\subfigure[Strong Scaling Test for AstroBEAR with Colliding Flow Module on Stampede]{
28\includegraphics[scale=0.9]{CFstrongOnStampede_f.eps}
29\caption{Current strong scaling behavior for AstroBEAR with 3D Colliding Flows module on Stampdede. For these tests we run with the exact same resolution as our proposed production runs ($40^{3}$ + 5 levels AMR for Colliding Flows and $21\times105\times21$ + 5 levels AMR for Jet), but for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Colliding Flows modules shows excellent scaling on Stampede (with slope=$-0.83$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops with this particular module and production set up when running on more processors (with slope=$-0.68$) although the code with other modules behaves very well on other systems,i.e., Kraken. Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
30\label{fig:CFstrong}
31\end{figure}
32
33\begin{figure}[ht]
34\centering
35\includegraphics[scale=0.9]{CFweakOnStampede_f.eps}
36\label{fig:CFweak}
37\caption{Current weak scaling behavior for AstroBEAR with 3D Colliding Flows module. The test is done with fixed grid and same work load on each processor(each processor has $64^3$ cells). Weak scaling is good to test the data communication part of the code since the work load on each processor are about same. The runtime efficiency of AstroBEAR drops only 15\% on up to 1096 processors and 27\% up to 4096 processors which shows a very good weak scaling. Our typical production runs will be on 1024 cores. }
38\end{figure}
39
40\begin{figure}[ht]
41\centering
42\includegraphics[scale=0.9]{TSFstrongOnStampede.eps}
43\caption{Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (3 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 100 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
44\label{fig:TSFMHD}
45\end{figure}
46
47\begin{figure}[ht]
48\includegraphics[scale=0.9]{TSFstrongOnStampede.eps}
49\caption{Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (2 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.}
50%\captionof{figure}{(a) Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (3 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors. (b) Current strong scaling behavior for AstroBEAR with 3D Triggered Star Formation module on Stampdede. The base dimensions for these tests are same as our production runs but with less levels of AMRs (2 levels AMR while the planed production runs has 4 levels). We restart from a point where the star formations starts to get a better estimate for much shorter final time (1\% of one frame while the production runs need 1000 frames) and without IOs. The current code with the Triggered Star Formation module shows excellent scaling on Stampede (with slope=$-0.84$ while the perfect scaling has a slope -1) up to 1024 processors. The behavior drops when running on more processors (with slope=$-0.69$ on up to 4096 processors). Our typical production runs are on 1024 processors and the SUs we request are based on the estimate time on 1024 processors.
51\label{fig:oldAlg}
52\end{figure}
53\end{document}