Version 7 (modified by 12 years ago) ( diff ) | ,
---|
Running AstroBEAR
NOTE: This tutorial assumes that you have successfully compiled AstroBEAR and set up a problem directory (denoted here by run_dir
). If you have not done so, please return to the User Guide for instructions on building your code and your run directory.
Running A Job
There are two ways to run a job: directly and through a queueing system. The direct method is basically like any other Linux/Unix operation—the user starts an n
-processor run at the command line and lets it go. Most personal desktop machines such as grass
use the direct method—their user-base is manageable, they're simply too small to really warrant a scheduler, and they're seldom used for production runs anyway.
Large shared clusters such as bluehive
usually use some kind of scheduling system. Rather than simply running AstroBEAR on one of these systems, you create a job script and submit it to the queue. The job then runs on one of the cluster's compute nodes. Most scheduling systems have an interactive mode you can use for short debugging sessions, but you still have to submit an interactive job to the queue.
IMPORTANT: DO NOT run a job using the direct method if your cluster uses a job scheduling system. Otherwise, you will start your job on the cluster's head-node, the head-node will slow to a crawl, other users will get upset and you will get a cross e-mail from the cluster administrators. If you need/prefer the direct method, then submit an interactive job or stick to an unscheduled cluster.
Grass
One of the aforementioned multicore desktops, grass
does not have a scheduler. To run a job on grass, simply move to your run_dir
and enter the following command:
mpiexec -np <nprocs> astrobear > outfile.out &
where <nprocs>
is the number of processors for your job. This starts an AstroBEAR run and pipes the standard output to outfile.out
. The &
makes it a background process, so you can watch the output by using the command
tail -f outfile.out
Bluehive
- AstroBEAR is distributed with a sample
bluehive
job script that you can modify to suit your needs. Execute the following commands to copy the required files to your run directoryrun_dir
:cp <astrobear_directory>/sumbission_scripts/bluehive.pbs <run_dir> cp <astrobear_directory>/sumbission_scripts/launch.s <run_dir>
This copies over the PBS scriptbluehive
uses to start the job, as well as thelaunch.s
script we use to set up certain environment variables.
- Move to <run_dir>, and open the
bluehive.pbs
file. The line beginning with#PBS -q
has five parameters that you will need to customize:nodes
: The number of nodes your job will use. The number of processors you request is equal tonodes * ppn
, and the most processors you can use on bluehive at any given time is 64. This is true whether you are running one 64-processor job or two 32-processor jobs.ppn
: The number of processors per node that you will use. Bluehive has eight processors per node, so if you need more than eight processors you will have to vary thenodes
parameter.pvmem
: The amount of system memory reserved for each processor. The easiest thing to do is just set this to2000m
. If you do this, be sure to setvmem
to-1
.vmem
: The total amount of system memory reserved for the job. This is a more flexible option thanpvmem
, in that it creates a pool of memory for each processor to draw from. This should be set to<2000*nodes*ppn>m
. If you do this, be sure to setpvmem
to-1
.walltime
: The amount of wall-clock time your job should run. In general, shorter wall-times are better if you can get away with them, because they run sooner. The standard queue on bluehive has a maximumwalltime
of 5 days (120 hours), but a 119-hour job runs almost as long as a 120-hour job and sometimes fools the scheduler into running it sooner.
- Find the line in
bluehive.pbs
starting with#PBS -M
and change the e-mail address on that line to yours.
- Find the line in
bluehive.pbs
starting with#PBS -N
. This is the name your job will display in the queue, so change it to something you will find useful/descriptive.
- Find the following line in
bluehive.pbs
:mpiexec -np <nprocs> ./launch.s > mpi_outfile.opt.$PBS_JOBID
This is the command to run AstroBEAR; change <nprocs> to match the number of processors your job will use.
- Save your changes to
bluehive.pbs
and enter the following command at the prompt (make sure you are inrun_dir
:qsub bluehive.pbs
If everything is okay in your PBS script, bluehive will return a line of the following form:<PBS_JOBID>.bhsn-int.bluehive.crc.private
Where<PBS_JOBID>
is a numerical ID assigned to your job. You can check up on your job(s) using the commandshowq -u <username>
To get an estimate of your job's start time, use the commandshowstart <PBS_JOBID>
Theshowstart
command doesn't always tell you anything useful, but it can be helpful for estimates.
For more information about using the bluehive cluster, check out the CRC's Bluehive wiki page. For those who prefer to read offline, they also have a tutorial in PDF form.
BlueGene
- AstroBEAR is distributed with a sample
bluegene
job script that you can modify to suit your needs. Execute the following commands to copy the required files to your run directoryrun_dir
:cp <astrobear_directory>/sumbission_scripts/bluegene.cmd <run_dir> cp <astrobear_directory>/sumbission_scripts/launch.s <run_dir>
This copies over the LoadLeveler scriptbluegene
uses to start the job, as well as thelaunch.s
script we use to set up certain environment variables.
- Move to the
run_dir
directory, open thebluegene.cmd
, and fine the line that starts with#@ job_name =
. This is the name the job will display in the queue; change it to something you find more useful/descriptive.
- Find the line in
bluegene.cmd
that starts with#@ wall_clock_limit
. This sets the wall clock time for your job. Change this to the time you want (remembering that BlueGene's maximum wall clock time is 3 days).
- Find the line in
bluegene.cmd
that starts with#@ bg_size
. This is the number of nodes that will be allocated to the job; change this to the number of nodes you want. BlueGene's minimum node allocation is 64 nodes, and the CRC recommends you allocate blocks in powers of two (64, 128, 256, etc.) for optimal performance. Each node has four cores on it, so the value you pick forbg_size
should benprocs / 4
.
- In
bluegene.cmd
, find the line that starts withmpirun -exp_env BG_MAXALIGNEXP -np <nprocs>
that is not commented (i.e., it doesn't begin with a#
). In that line, change the value <nprocs> to the number of processors you want.
- Save your changes to
bluegene.cmd
and enter the commandllsubmit bluegene.cmd
This will submit your job to the queue. If all goes well, BlueGene will return the output:
llsubmit: The job "bgpfen.crc.rochester.edu.<JOB_ID>" has been submitted.
where <JOB_ID> is a numerical ID that the LoadLeveler queue will use to identify your job. You can check up on your job(s) using the command
llq -bTo get an estimate of your job's start time, use the command
llq -s bgpfen.<JOB_ID>The
showstart
command doesn't always tell you anything useful, but it can be helpful for estimates.
For more information about BlueGene, visit the CRC's BlueGene wiki page.
Itasca
- AstroBEAR is distributed with a sample
itasca
job script that you can modify to suit your needs. Execute the following commands to copy the required files to your run directoryrun_dir
:cp <astrobear_directory>/sumbission_scripts/itasca.pbs <run_dir> cp <astrobear_directory>/sumbission_scripts/launch.s <run_dir>
This copies over the Torque script thatitasca
uses to start the job, as well as thelaunch.s
script we use to set up certain environment variables.
- Move to <run_dir>, and open the
itasca.pbs
file. The line beginning with#PBS -l
has three parameters that you will need to customize:walltime
: The amount of wall-clock time you intend for your job to take. Itasca's standard queue has a maximum of 24 hours, but shorter jobs in general are good. The University of Minnesota group gets a certain number of CPU hours from MSI that they're willing to share with us, so don't be extravagant.mem
: Like thevmem
parameter on bluehive, this represents the total amount of memory this job will use. Itasca has 3 GB per core, so avmem
value ofnprocs * 24000m
should be about right (e.g., for 64 processors, usevmem=192000m
).nodes
: The number of nodes your job will use. The number of processors you request is equal tonodes * ppn
. While itasca has no limit on the number of nodes you can request (beyond the hard cluster limit of 1088 nodes), smaller jobs nonetheless run sooner. Plus, large jobs eat into the UMN group's CPU-hour allocation, so please be considerate.ppn
: The number of processors per node that you will use. Itasca has eight processors per node, so if you need more than eight processors you will have to vary thenodes
parameter.
- Find the line in
itasca.pbs
starting with#PBS -N
. This is the name your job will display in the queue, so change it to something you will find useful/descriptive.
- Look for the
cd <directory>
line initasca.pbs
and change the directory path to that of your problem directoryrun_dir
.
- Save your changes to
itasca.pbs
and enter the following command at the prompt (make sure you are in the same directory asitasca.pbs
:qsub itasca.pbs
You can check up on your job(s) using the commandshowq -u <username>
To get an estimate of your job's start time, use the commandshowstart <PBS_JOBID>
Where<PBS_JOBID>
is a numerical ID assigned to your job (the job ID is returned upon submission and can also be retrieved with theshowq -u <username>
command). Theshowstart
command doesn't always tell you anything useful, but it can be helpful for estimates.
For more information about using the itasca cluster, check out MSI's itasca page..