About: About blog posts
Browse by time:
- July 2022 (1)
- May 2022 (1)
- April 2022 (3)
- March 2022 (2)
- February 2022 (3)
- January 2022 (2)
- December 2021 (4)
- November 2021 (2)
- September 2021 (6)
- August 2021 (1)
- July 2021 (7)
- June 2021 (2)
- May 2021 (5)
- April 2021 (3)
- March 2021 (7)
- February 2021 (5)
- January 2021 (4)
- December 2020 (7)
- November 2020 (12)
- October 2020 (13)
- September 2020 (11)
- August 2020 (14)
- July 2020 (17)
- June 2020 (18)
- May 2020 (12)
- April 2020 (14)
- March 2020 (15)
- February 2020 (13)
- January 2020 (7)
- December 2019 (6)
- November 2019 (11)
- October 2019 (15)
- September 2019 (14)
- August 2019 (4)
- July 2019 (10)
- June 2019 (5)
- May 2019 (8)
- April 2019 (5)
- March 2019 (13)
- February 2019 (5)
- January 2019 (10)
- December 2018 (5)
- November 2018 (9)
- October 2018 (13)
- September 2018 (12)
- August 2018 (6)
- July 2018 (5)
- June 2018 (9)
- May 2018 (5)
- April 2018 (9)
- March 2018 (14)
- February 2018 (16)
- January 2018 (7)
- December 2017 (5)
- November 2017 (8)
- October 2017 (11)
- September 2017 (10)
- August 2017 (9)
- June 2017 (16)
- May 2017 (14)
- April 2017 (6)
- March 2017 (6)
- February 2017 (4)
- January 2017 (11)
- December 2016 (5)
- November 2016 (13)
- October 2016 (7)
- September 2016 (11)
- August 2016 (6)
- July 2016 (19)
- June 2016 (12)
- May 2016 (11)
- April 2016 (11)
- March 2016 (13)
- February 2016 (17)
- January 2016 (10)
- December 2015 (5)
- November 2015 (15)
- October 2015 (19)
- September 2015 (18)
- August 2015 (23)
- July 2015 (32)
- June 2015 (17)
- May 2015 (23)
- April 2015 (28)
- March 2015 (23)
- February 2015 (19)
- January 2015 (17)
- December 2014 (26)
- November 2014 (42)
- October 2014 (33)
- September 2014 (30)
- August 2014 (16)
- July 2014 (27)
- June 2014 (37)
- May 2014 (19)
- April 2014 (14)
- March 2014 (35)
- February 2014 (30)
- January 2014 (28)
- December 2013 (25)
- November 2013 (30)
- October 2013 (41)
- September 2013 (48)
- August 2013 (36)
- July 2013 (44)
- June 2013 (39)
- May 2013 (29)
- April 2013 (36)
- March 2013 (35)
- February 2013 (31)
- January 2013 (48)
- December 2012 (20)
- November 2012 (29)
- October 2012 (48)
- September 2012 (30)
- August 2012 (16)
- July 2012 (32)
- June 2012 (27)
- May 2012 (26)
- April 2012 (25)
- March 2012 (30)
- February 2012 (35)
- January 2012 (25)
- December 2011 (23)
- November 2011 (41)
- October 2011 (31)
- September 2011 (29)
- August 2011 (23)
- July 2011 (24)
- June 2011 (18)
- May 2011 (3)
Browse by author:
- rss Bo Peng (1)
- rss Yisheng (26)
- rss aanand6 (20)
- rss adebrech (151)
- rss afrank (30)
- rss aliao (6)
- rss alipnicky (2)
- rss amyzou (54)
- rss aquillen (1)
- rss blin (18)
- rss bliu (270)
- rss bpeng6 (7)
- rss brockjw (7)
- rss bshroyer (6)
- rss ceh5286 (2)
- rss dnp19 (7)
- rss ehansen (290)
- rss elambrid (12)
- rss erica (280)
- rss esavitch (43)
- rss fschmidt (35)
- rss gguidarelli (4)
- rss idilernia (35)
- rss johannjc (245)
- rss lchamandy (104)
- rss likuntian (5)
- rss lsabin (2)
- rss madams (61)
- rss martinhe (76)
- rss mblank (30)
- rss mccann (3)
- rss mehr (16)
- rss noyesma (6)
- rss rmarkwic (20)
- rss shuleli (146)
- rss smurugan (24)
- rss yirak (4)
- rss ytlee (1)
- rss zchen (165)
Browse by category:
- rss Accretion (1)
- rss Bonner-Ebert (1)
- rss Bvn (1)
- rss CollidingFlows (27)
- rss Disks (1)
- rss FieldLoop (1)
- rss Magnetic-tower (1)
- rss Meeting-outline (1)
- rss RAID (1)
- rss RT (1)
- rss Resistive_MHD (1)
- rss Test (1)
- rss alfalfa (3)
- rss animation (1)
- rss bamboo (2)
- rss bipolar (1)
- rss bluehive2 (1)
- rss bluestreak (3)
- rss cameraobjects (6)
- rss clover (1)
- rss clump (4)
- rss cooling (1)
- rss data-management (4)
- rss development (1)
- rss disks (3)
- rss documentation (15)
- rss dust (32)
- rss gpu (1)
- rss grass (2)
- rss hydrostatic (1)
- rss mGlobal (1)
- rss magnetic-field (1)
- rss mass (2)
- rss movies (1)
- rss mx (1)
- rss nebula (1)
- rss notification (1)
- rss others-research (1)
- rss outreach (1)
- rss parameter (1)
- rss plugins (1)
- rss query (1)
- rss scaling (1)
- rss script (1)
- rss shape (3)
- rss sinks (1)
- rss stampede (2)
- rss streamlines (2)
- rss study (1)
- rss tasks (7)
- rss testing (6)
- rss ticketchart (1)
- rss tutorial (1)
- rss visit (4)
- rss visualization (3)
- rss vnc (1)
- rss vpn (3)
- rss w00t (1)
- rss wind (4)
- rss wind-capture (1)
optimization with OpenMP on Blue Gene/Q
Replace the vectorized FORALL loop with parallelized DO loops in sweep_scheme.f90 An example is to replace:
DO i=mB(1,1), mB(1,2) FORALL(j=mB(2,1):mB(2,2),k=mB(3,1):mB(3,2)) beforesweepstep_%data(beforesweepstep_%x(i),j,k,1,1:NrHydroVars) = & Info%q(index+i,j,k,1:NrHydroVars) END FORALL END DO
by
!$OMP PARALLEL DO PRIVATE(k,j,i) COLLAPSE(3) DO k=mB(3,1),mB(3,2) DO j=mB(2,1),mB(2,2) DO i=mB(1,1), mB(1,2) beforesweepstep_%data(1:NrHydroVars,1,i,j,beforesweepstep_%x(k)) = Info%q(i,j,index+k,1:NrHydroVars) END DO END DO END DO !$OMP END PARALLEL DO
Testing results on Blue Streak are
- 1283 + 4AMR, Current Revision Running Time on 512 cores: 224.57 (Tasks per node=16)
Tasks per node OMP_NUM_THREADS Total Running Time 1 32 3375.17 2 16 2019.94 4 8 1265.58 8 4 1052.74 16 2 907.62 32 1 1151.02
Tasks per node OMP_NUM_THREADS Total Running Time 1 64 >3600 2 32 2039.45 4 16 1181.27 8 8 946.2 16 4 741.81 32 2 737.68 64 1 877.07
- 323 + 4 AMR, Current Revision Running Time on 512 cores: 33.26 (Tasks per node=16)
Tasks per node OMP_NUM_THREADS Total Running Time 1 64 191.42 2 32 122.68 4 16 82.43 8 8 70.78 16 4 72.78 32 2 85.65 64 1 129.95
Tasks per node OMP_NUM_THREADS Total Running Time 1 32 164.59 2 16 105.90 4 8 86.67 8 4 79.62 16 2 84.98 32 1 128.47
The job submission script on Blue Streak is like
#!/bin/bash #SBATCH -J strongTest #SBATCH --nodes=32 #SBATCH --ntasks-per-node=4 #SBATCH -p debug #SBATCH -t 01:00:00 module purge module load mpi-xl module load hdf5-1.8.8-MPI-XL module load fftw-3.3.2-MPI-XL module load hypre-2.8.0b-MPI-XL ulimit -s unlimited export OMP_NUM_THREADS=16 #1node 8 processors srun astrobear > strong_4ThreadsperNode_X16.log
swap the DO loop layers to i, j, k — the difference of running time is small comparing k,j,i case
!$OMP PARALLEL DO PRIVATE(i,j,k) COLLAPSE(3) DO i=mB(1,1), mB(1,2) DO j=mB(2,1),mB(2,2) DO k=mB(3,1),mB(3,2) beforesweepstep_%data(1:NrHydroVars,1,i,j,beforesweepstep_%x(k)) = Info%q(i,j,index+k,1:NrHydroVars) END DO END DO END DO !$OMP END PARALLEL DO
Tasks per node OMP_NUM_THREADS Total Running Time 1 16 >3600 2 16 2099.57 4 16 1208.53 8 8 912.56 16 4 758.78 16 2 969.74 16 1 1436.98
- Posted: 12 years ago (Updated: 12 years ago)
- Author: Baowei Liu
- Categories: (none)
Comments
No comments.