-
Linux-Fu
- Command Line Reference
- Moving data from simulation directories to longer term storage
- Individually Tarballing Files
-
Bash Shell Configuration:
.bashrc
and.bash_profile
- Command Aliasing
- Exporting Environment Variables
- Persistent SCP connections
- Regular Expressions in emacs
- Monitoring the contents of a directory with 'watch'
- Pushd/Popd
- killing old defunct astrobear processes
- Changing the endiannes of bov files from little to big
- FORTRAN Command Line and Integer/String Read/Writes
- The Modules package
- Scaling Tests scripts
- Projecting okc files into curve files for plotting particle positions …
Linux-Fu
Got something that helps you use Linux? Know of a neat trick, or just figure something out? Let everyone know about it here.
Command Line Reference
Click here for a more complete listing of Linux commands, grouped by purpose.
Emacs — quick reference card.
vim — quick reference card.
Linux — quick reference card.
Moving data from simulation directories to longer term storage
- Just navigate to your simulation directory
cd /scratch/johndoe/MySimulation1
- And then run rsync
rsync -avz ./ user@host:/data/SimulationDir
- You can also detach the process from the terminal by using nohup…
nohup rsync -avz ./ user@host:/data/SimulationDir <enter password > <ctrl+z> bg %1 logout
- If you come back later and there are more frames - you can run the same command and it will only transfer the new frames.
Individually Tarballing Files
It is often useful to tarball files before transferring them between compute clusters and local storage. Large files should be rolled into separate tarballs to improve transfer efficiency.
Examples
To tarball chombo*hdf in a problem directory in bash
, use a FOR loop:
user:~> for i in $(ls out/ch*hdf); do tar -czvf $i.tar.gz $i [&& rm $i]; done
The bracketed rm $i
command deletes the original chombo file, saving space.
You can get more creative with this. Say you've tar'd files 1-10, and now want to do 20-30. Let's also say that what you're really interested in is the later files, so you'd like to tar in reverse order. The following uses seq
to generate a list of numbers in reverse order, which are then converted to 5-digit integers via printf
and export
:
user:~> for i in $(seq 30 -1 20); do export num=`printf %5.5i $i`; tar -czvf chombo$num.hdf.tar.gz out/chombo$num.hdf [&& rm out/chombo$num.hdf]; done
Note the backticks on the export
statement. The above can all be given on one line but is broken up here for clarity. Of course, if you want them to tar in normal order, you can simplify the above, as
user:~> for i in {20..30}; do export ...
With passwordless SSH, you can add a quiet scp statement for each tarball, scp chombo$num.hdf.tar.gz user@host:location &>/dev/null
.
Finally, here's a more complicated statement related to tar'ing Brick-of-value *.dat files:
user:~> for i in {0..35}; do export prefix="W_`printf %3.3i $i`" ; for j in $(ls $prefix*.dat); do tar -czvf $j.tar.gz $j && rm -v $j; done; done
Bash Shell Configuration: .bashrc
and .bash_profile
Whenever you launch a bash
shell via terminal, the shell environment is configured by the .bash_profile
and .bashrc
files in your home directory. The two files theoretically fulfill different roles, but the functionality they provide is very similar.
- The
.bash_profile
shell is executed when the you are logging in, be it through SSH, SFTP, or some other means. Basically, any launch that requires a username and password will execute the options.bash_profile
. - The
.bashrc
file, in contrast, is automatically executed when a non-login interactive shell is launched. For instance, if you are logged directly into a Linux machine and open a terminal window on the desktop, then.bashrc
will be used instead of.bash_profile
.
In practice, it's better to keep all of your environment settings in one of the two files. Otherwise, you'll have to change two files in order to change your shell environment. If for instance, a library path or module was changed in .bashrc
and the change wasn't propagated to .bash_profile
, then the new option might be unavailable for remote users (who log into the system, and therefore trigger .bash_profile
).
For this reason, we usually put all of our environment configuration command in .bashrc
and just add some lines to .bash_profile
that invoke .bashrc
:
if [ -f ~/.bashrc ]; then source ~/.bashrc fi
Aside from this, .bash_profile
is best kept relatively empty. This ensures that .bash_profile
doesn't contain any settings that might override the ones in .bashrc
.
Command Aliasing
There are several tricks with .bashrc
that you can use to make your life easier. The first is the alias
command, which maps complex shell-executable expressions to simpler commands using the form:
alias <command>="<bash shell expression>"
Examples
To always enable X11 forwarding in SSH:
alias ssh="ssh -Y"
To specify the build of VisIt in /opt/visit/bin
to execute on the command visit
:
alias visit="/opt/visit/bin/visit"
To apply .bashrc
changes without logging out:
source ~/.bashrc
Exporting Environment Variables
Another useful trick in .bashrc
is environment variable export. By including lines of the form export VARIABLE_NAME=<variable_value>
, we make the variable $VARIABLE_NAME accessible within the command-line environment.
This is especially useful when applied to the $PATH
and $LD_LIBRARY_PATH
variables. These pre-existing environment variables contain the paths Linux searches to look for executables and shared library objects, respectively.
Examples
export PATH=$PATH:/usr/local/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/hdf5/lib
Note the use of $PATH
and $LD_LIBRARY_PATH
in these variables. This concatenates the new path to any existing list of paths in the variable. Linux searches these paths in order, so if you want one path to appear before the rest you should use export <new_path>:$PATH
.
Execute source ~/.bashrc
after editing .bashrc
to implement changes.
Persistent SCP connections
See the Persistent SCP page.
Regular Expressions in emacs
A regular expression (regex or regexp) is a special text string for describing a search pattern. The emacs editor supports regular expressions for finding and manipulating text.
Examples
I have several functions of the form C_name1(a,b,c)
,
C_name2(a,b,c)
, etc., to rename with an extra underscore, i.e., C_name1_(a,b,c)
.
In emacs:
(M-X) replace-regexp (1st prompt) \(C_\w*\)( (2nd prompt) \1_(
In the first prompt (the to-be-replaced string), you can specify substrings using the \( and \) characters to group them. So I've wrapped the function name in such a group. The \w character will find any word character (no whitespace), and the * finds any number of them. It stops when it reaches the (, which is the next non-wildcard character in the to-be-replaced string.
Emacs allows you to have multiple groups in the to-be-replaced string, which you can reference in your replacement string. They are referenced left-to-right by \1, \2, etc. So here, I indicate that in the new string I want the first group (e.g. "C_name1") to come first, followed by a _(.
Conversely, if you want to pre/append an entire string (say, change all foot's to football's), you could use the \& character which represents the entire to-be-replaced string.
Monitoring the contents of a directory with 'watch'
Occasionally, a user wants to be able to watch a directory so that they can see when changes are made. An example is monitoring a problem directory on bg/p, in order to notice when a submitted job starts running since even if it says it starts immediately, output is delayed somewhat.
Examples
This can be done with the watch
command, e.g. in your .bashrc:
alias wa='watch -d -n 1 ls -lht 2>/dev/null'
This will reexamine the current directory every second ("-n 1"), highlight changes ("-d"), and quietly ignore errors ("2>/dev/null"). Thus it is easy for the user to see when, for example, their problem *.out file is created and starts getting written to. Conversely, they know immediately when a file has quit and dumped core files, saving them the task of polling the directory manually or checking their email for a job-quit email message from the job scheduler.
Pushd/Popd
The pushd
and popd
set of commands allows Linux users to store directories on a stack and easily navigate between them.
Examples
Say you're debugging a problem module and you find that you're constantly switching between your problem directory (~/myprob) and the source directory (~/mycode/source). You can quickly bounce back and forth between the two with pushd and popd. pushd places your current directory and your destination directory onto a directory stack; you can subsequently alternate between the two by typing pushd without any arguments:
user:~/myprob> pushd ~/mycode/ user:~/mycode> cd source user:~/mycode/source> pushd user:~/myprob> pushd user:~/mycode/source> wow!
popd removes the current directory from the directory stack and puts you into the other directory; in general it would only be used to clear the stack.
pushd sets an environment variable $OLDPWD. This lets you greatly shorten the command to go back and forth from compilation and running the code. For instance, say you were editing the code in one terminal, and all you're doing in a second terminal is recompiling and running the code. You can do this all on one line like the following:
user:~/myprob> pushd ~/mycode && make mpibear && cp mpibear $OLDPWD && popd && mpirun -n 2 ./mpibear
Note that the use of "&&" instead of ";" in between commands will make the sequence halt if one exits with an error code (e.g., if there's a problem with compilation).
killing old defunct astrobear processes
Often when debugging you will end up with many astrobear processes that are defunct. Just run
ps axu | grep astrobear
to see a list of all the astrobear processes running on a machine. To kill your processes run
killall -e astrobear -s 9 -u yourusername
This will send signal 9 to all astrobear processes that you are currently running.
Changing the endiannes of bov files from little to big
Occasionally you may end up generating data on a big endian machine, but the description of the data in the .bov file says little endian. In order to view the data in visit you need to correct the endian flag in each .bov file.
mkdir out_temp cp out/*.bov out_temp/ cd out_temp for i in `ls *.bov`; do sed 's/LITTLE/BIG/' $i > ../out/$i; done cd ../ rm -rf out_temp
FORTRAN Command Line and Integer/String Read/Writes
See the FortranCommandLine page.
The Modules package
See the Modules page.
Scaling Tests scripts
Weak Scaling Scripts
For weak scaling we need to specify the dimensions of each problem and the number of processors we want to use. Then separate directories are created along with customized job scripts which are then submitted to the queue. The job scripts are customized by echoing modified PBS directives along with any other necessary variables needed by the pbs script (in this case nproc, base_resx, and base_resy)
#!/bin/bash NP=(256 128 64 32 16 8) base_resx=(724 512 362 256 180 128) base_resy=(724 512 362 256 180 128) for (( j=0;j<${#NP[@]};j++)); do nproc=${NP[j]} mkdir $nproc cp *.data $nproc cd $nproc mkdir out nodes=`expr $nproc / 8` echo "#!/bin/bash" > scrambler.pbs echo "#PBS -q debug" >> scrambler.pbs echo "#PBS -l nodes=$nodes:ppn=8,pvmem=1000mb,walltime=1:00:00" >> scrambler.pbs echo "#PBS -N weakscalingtest-$nproc" >> scrambler.pbs echo "nProcs=$nproc" >> scrambler.pbs echo "base_resx=${base_resx[j]}" >> scrambler.pbs echo "base_resy=${base_resy[j]}" >> scrambler.pbs cat ../scrambler.pbs >> scrambler.pbs # qsub scrambler.pbs cd .. done
This then adds the following
#!/bin/bash #PBS -q debug #PBS -l nodes=1:ppn=8,pvmem=1000mb,walltime=1:00:00 #PBS -N weakscalingtest-8 nProcs=8 base_resx=128 base_resy=128
to the beginning of the default scrambler.pbs script
echo "===========" echo "Running on:" cat $PBS_NODEFILE echo "===========" cd $PBS_O_WORKDIR f=(.25 .5 .75) maxlevel=(0 1 2 3 4) threaded=(-1 0) mv data.out data.out.old ../subst.s Gmx $base_resx,$base_resy,1 global.data ../subst.s domain%mGlobal 1,1,1,$base_resx,$base_resy,1 global.data for (( l=0;l<${#maxlevel[@]};l++)); do ../subst.s MaxLevel ${maxlevel[l]} global.data for (( k=0;k<${#threaded[@]};k++)); do ../subst.s iThreaded ${threaded[k]} global.data for (( i=0;i<${#f[@]};i++)); do ../subst.s filling_fractions ${f[i]} problem.data echo ${maxlevel[l]} ${threaded[k]} ${f[i]} ${NP[j]} mpirun -n $nProcs ../astrobear > output.out grep scale_data output.out >> data.out done done done
Then when the job runs, the scrambler.pbs script can modify the various data files using the subst.s script to swap out different parameters and peform the various different runs all with the same number of processors. The only requirement is that the values of a namelist variable appear on the same line as the namelist variable.
For example
Gmx = 64,64,1 !Base resolution
instead of
Gmx = 64, ! cells in x 64, ! cells in y 1, ! cells in z
Here is the subst.s script
#!/bin/bash # "subst", a script that substitutes one pattern for # another in a file, # i.e., "subst Smith Jones letter.txt". ARGS=3 E_BADARGS=65 # Wrong number of arguments passed to script. if [ $# -ne "$ARGS" ] # Test number of arguments to script (always a good idea). then echo "Usage: `basename $0` variable new-value filename" exit $E_BADARGS fi var_pattern=$1 new_value=$2 if [ -f "$3" ] then file_name=$3 else echo "File \"$3\" does not exist." exit $E_BADARGS fi # Here is where the heavy work gets done. newfile="$file_name""2" sed -e "s/$var_pattern\(\s*=\s*\)\S*/$var_pattern\1$new_value/i" $file_name | uniq > $newfile mv $newfile $file_name # I'm sure there is a better way then to pipe the output through unique but I got frustrated trying JJC # 's' is, of course, the substitute command in sed, # and /pattern/ invokes address matching. # The (\s*=\s*\) matches (and saves) all of the white space on either side of the equals sign # The \S* terminates at the first non-white space character which is presumably the end of the variable value # Note for array definitions like qTolerance = 1e-3 1e-3 1e-3, this will replace only the first value. # This can be avoided by editing your data files to remove white space in array declarations (ie qTolerance = 1e-3,1e-3,1e-3) exit 0 # Successful invocation of the script returns 0.
Projecting okc files into curve files for plotting particle positions over projections
Example
The awk line selects only odd numbered lines starting at 31 and prints out the 2nd and 3rd fields. This is for projecting along x. To project along y or z, just change $2, $3
to $3, $1
or $1, $2
Curve and okc files in a database always have to have 1 entry - which in the okc file is 0.0 0.0. The sed just switches these for -100,-100 so they don't appear in the window when projecting.
for i in $(ls sinks_*.okc); do echo "#yz" > $i.x.curve; cat $i | awk 'NR%2==1 && NR > 30 {print $2, $3}' | sed 's/0.0000000000000000E+00 0.0000000000000000E+00/-100 -100/g' >> $i.x.curve; done
Then open the .x.curve files in visit as Curve2D and plot the 'yz' data set (don't plot lines - just points)