wiki:FrequentQuestions/LinuxFu

Version 29 (modified by Baowei Liu, 12 years ago) ( diff )

Linux-Fu

Got something that helps you use Linux? Know of a neat trick, or just figure something out? Let everyone know about it here.

Command Line Reference

Click here for a more complete listing of Linux commands, grouped by purpose.

Emacs — quick reference card.

vim — quick reference card.

Linux — quick reference card.

Moving data from simulation directories to longer term storage

  • Just navigate to your simulation directory

cd /scratch/johndoe/MySimulation1

  • And then run rsync

rsync -avz ./ user@host:/data/SimulationDir

  • You can also detach the process from the terminal by using nohup…
    nohup rsync -avz ./ user@host:/data/SimulationDir
    <enter password >
    <ctrl+z>
    bg %1
    logout
    
  • If you come back later and there are more frames - you can run the same command and it will only transfer the new frames.

Individually Tarballing Files

It is often useful to tarball files before transferring them between compute clusters and local storage. Large files should be rolled into separate tarballs to improve transfer efficiency.

Examples

To tarball chombo*hdf in a problem directory in bash, use a FOR loop:

user:~> for i in $(ls out/ch*hdf);   do tar -czvf $i.tar.gz $i [&& rm $i];   done

The bracketed rm $i command deletes the original chombo file, saving space.

You can get more creative with this. Say you've tar'd files 1-10, and now want to do 20-30. Let's also say that what you're really interested in is the later files, so you'd like to tar in reverse order. The following uses seq to generate a list of numbers in reverse order, which are then converted to 5-digit integers via printf and export:

user:~> for i in $(seq 30 -1 20); do 
  export num=`printf %5.5i $i`; 
  tar -czvf chombo$num.hdf.tar.gz out/chombo$num.hdf [&& rm out/chombo$num.hdf]; 
  done

Note the backticks on the export statement. The above can all be given on one line but is broken up here for clarity. Of course, if you want them to tar in normal order, you can simplify the above, as

user:~> for i in {20..30}; do export ...

With passwordless SSH, you can add a quiet scp statement for each tarball, scp chombo$num.hdf.tar.gz user@host:location &>/dev/null.

Finally, here's a more complicated statement related to tar'ing Brick-of-value *.dat files:

user:~> for i in {0..35}; do 
  export prefix="W_`printf %3.3i $i`" ;
  for j in $(ls $prefix*.dat); do 
    tar -czvf $j.tar.gz $j && rm -v $j;
  done;
done

Bash Shell Configuration: .bashrc and .bash_profile

Whenever you launch a bash shell via terminal, the shell environment is configured by the .bash_profile and .bashrc files in your home directory. The two files theoretically fulfill different roles, but the functionality they provide is very similar.

  • The .bash_profile shell is executed when the you are logging in, be it through SSH, SFTP, or some other means. Basically, any launch that requires a username and password will execute the options .bash_profile.
  • The .bashrc file, in contrast, is automatically executed when a non-login interactive shell is launched. For instance, if you are logged directly into a Linux machine and open a terminal window on the desktop, then .bashrc will be used instead of .bash_profile.

In practice, it's better to keep all of your environment settings in one of the two files. Otherwise, you'll have to change two files in order to change your shell environment. If for instance, a library path or module was changed in .bashrc and the change wasn't propagated to .bash_profile, then the new option might be unavailable for remote users (who log into the system, and therefore trigger .bash_profile).

For this reason, we usually put all of our environment configuration command in .bashrc and just add some lines to .bash_profile that invoke .bashrc:

if [ -f ~/.bashrc ]; then
        source ~/.bashrc
fi

Aside from this, .bash_profile is best kept relatively empty. This ensures that .bash_profile doesn't contain any settings that might override the ones in .bashrc.


Command Aliasing

There are several tricks with .bashrc that you can use to make your life easier. The first is the alias command, which maps complex shell-executable expressions to simpler commands using the form:

alias <command>="<bash shell expression>"

Examples

To always enable X11 forwarding in SSH:

alias ssh="ssh -Y"

To specify the build of VisIt in /opt/visit/bin to execute on the command visit:

alias visit="/opt/visit/bin/visit"

To apply .bashrc changes without logging out:

source ~/.bashrc

Exporting Environment Variables

Another useful trick in .bashrc is environment variable export. By including lines of the form export VARIABLE_NAME=<variable_value>, we make the variable $VARIABLE_NAME accessible within the command-line environment.

This is especially useful when applied to the $PATH and $LD_LIBRARY_PATH variables. These pre-existing environment variables contain the paths Linux searches to look for executables and shared library objects, respectively.

Examples

export PATH=$PATH:/usr/local/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/hdf5/lib

Note the use of $PATH and $LD_LIBRARY_PATH in these variables. This concatenates the new path to any existing list of paths in the variable. Linux searches these paths in order, so if you want one path to appear before the rest you should use export <new_path>:$PATH.

Execute source ~/.bashrc after editing .bashrc to implement changes.

Persistent SCP connections

See the Persistent SCP page.

Regular Expressions in emacs

A regular expression (regex or regexp) is a special text string for describing a search pattern. The emacs editor supports regular expressions for finding and manipulating text.

Examples

I have several functions of the form C_name1(a,b,c), C_name2(a,b,c), etc., to rename with an extra underscore, i.e., C_name1_(a,b,c).

In emacs:

(M-X) replace-regexp
(1st prompt) \(C_\w*\)(
(2nd prompt) \1_(

In the first prompt (the to-be-replaced string), you can specify substrings using the \( and \) characters to group them. So I've wrapped the function name in such a group. The \w character will find any word character (no whitespace), and the * finds any number of them. It stops when it reaches the (, which is the next non-wildcard character in the to-be-replaced string.

Emacs allows you to have multiple groups in the to-be-replaced string, which you can reference in your replacement string. They are referenced left-to-right by \1, \2, etc. So here, I indicate that in the new string I want the first group (e.g. "C_name1") to come first, followed by a _(.

Conversely, if you want to pre/append an entire string (say, change all foot's to football's), you could use the \& character which represents the entire to-be-replaced string.

Monitoring the contents of a directory with 'watch'

Occasionally, a user wants to be able to watch a directory so that they can see when changes are made. An example is monitoring a problem directory on bg/p, in order to notice when a submitted job starts running since even if it says it starts immediately, output is delayed somewhat.

Examples

This can be done with the watch command, e.g. in your .bashrc:

alias wa='watch -d -n 1 ls -lht 2>/dev/null'

This will reexamine the current directory every second ("-n 1"), highlight changes ("-d"), and quietly ignore errors ("2>/dev/null"). Thus it is easy for the user to see when, for example, their problem *.out file is created and starts getting written to. Conversely, they know immediately when a file has quit and dumped core files, saving them the task of polling the directory manually or checking their email for a job-quit email message from the job scheduler.

Pushd/Popd

The pushd and popd set of commands allows Linux users to store directories on a stack and easily navigate between them.

Examples

Say you're debugging a problem module and you find that you're constantly switching between your problem directory (~/myprob) and the source directory (~/mycode/source). You can quickly bounce back and forth between the two with pushd and popd. pushd places your current directory and your destination directory onto a directory stack; you can subsequently alternate between the two by typing pushd without any arguments:

user:~/myprob> pushd ~/mycode/ user:~/mycode> cd source user:~/mycode/source> pushd user:~/myprob> pushd user:~/mycode/source> wow!

popd removes the current directory from the directory stack and puts you into the other directory; in general it would only be used to clear the stack.

pushd sets an environment variable $OLDPWD. This lets you greatly shorten the command to go back and forth from compilation and running the code. For instance, say you were editing the code in one terminal, and all you're doing in a second terminal is recompiling and running the code. You can do this all on one line like the following:

user:~/myprob> pushd ~/mycode && make mpibear && cp mpibear $OLDPWD && popd && mpirun -n 2 ./mpibear

Note that the use of "&&" instead of ";" in between commands will make the sequence halt if one exits with an error code (e.g., if there's a problem with compilation).

killing old defunct astrobear processes

Often when debugging you will end up with many astrobear processes that are defunct. Just run

ps axu | grep astrobear 

to see a list of all the astrobear processes running on a machine. To kill your processes run

killall -e astrobear -s 9 -u yourusername

This will send signal 9 to all astrobear processes that you are currently running.

Changing the endiannes of bov files from little to big

Occasionally you may end up generating data on a big endian machine, but the description of the data in the .bov file says little endian. In order to view the data in visit you need to correct the endian flag in each .bov file.

mkdir out_temp
cp out/*.bov out_temp/
cd out_temp
for i in `ls *.bov`; do sed 's/LITTLE/BIG/' $i > ../out/$i; done
cd ../
rm -rf out_temp

FORTRAN Command Line and Integer/String Read/Writes

See the FortranCommandLine page.

The Modules package

See the Modules page.

Scaling Tests scripts

Weak Scaling Scripts

For weak scaling we need to specify the dimensions of each problem and the number of processors we want to use. Then separate directories are created along with customized job scripts which are then submitted to the queue. The job scripts are customized by echoing modified PBS directives along with any other necessary variables needed by the pbs script (in this case nproc, base_resx, and base_resy)

#!/bin/bash
NP=(256 128 64 32 16 8)
base_resx=(724 512 362 256 180 128)
base_resy=(724 512 362 256 180 128)
for (( j=0;j<${#NP[@]};j++)); do
  nproc=${NP[j]}
  mkdir $nproc
  cp *.data $nproc
  cd $nproc
  mkdir out
  nodes=`expr $nproc / 8`
  echo "#!/bin/bash" > scrambler.pbs
  echo "#PBS -q debug" >> scrambler.pbs
  echo "#PBS -l nodes=$nodes:ppn=8,pvmem=1000mb,walltime=1:00:00" >> scrambler.pbs
  echo "#PBS -N weakscalingtest-$nproc" >> scrambler.pbs
  echo "nProcs=$nproc" >> scrambler.pbs
  echo "base_resx=${base_resx[j]}" >> scrambler.pbs
  echo "base_resy=${base_resy[j]}" >> scrambler.pbs
  cat ../scrambler.pbs >> scrambler.pbs
#  qsub scrambler.pbs
  cd ..
done

This then adds the following

#!/bin/bash
#PBS -q debug
#PBS -l nodes=1:ppn=8,pvmem=1000mb,walltime=1:00:00
#PBS -N weakscalingtest-8
nProcs=8
base_resx=128
base_resy=128

to the beginning of the default scrambler.pbs script

echo "==========="
echo "Running on:"
cat $PBS_NODEFILE
echo "==========="

cd $PBS_O_WORKDIR

f=(.25 .5 .75)
maxlevel=(0 1 2 3 4)
threaded=(-1 0)
mv data.out data.out.old
../subst.s  Gmx $base_resx,$base_resy,1 global.data
../subst.s  domain%mGlobal 1,1,1,$base_resx,$base_resy,1 global.data

for (( l=0;l<${#maxlevel[@]};l++)); do
  ../subst.s MaxLevel ${maxlevel[l]} global.data
  for (( k=0;k<${#threaded[@]};k++)); do
    ../subst.s iThreaded ${threaded[k]} global.data
    for (( i=0;i<${#f[@]};i++)); do
      ../subst.s filling_fractions ${f[i]} problem.data
      echo ${maxlevel[l]} ${threaded[k]} ${f[i]} ${NP[j]}
      mpirun -n $nProcs ../astrobear > output.out
      grep scale_data output.out >> data.out
    done
  done
done

Then when the job runs, the scrambler.pbs script can modify the various data files using the subst.s script to swap out different parameters and peform the various different runs all with the same number of processors. The only requirement is that the values of a namelist variable appear on the same line as the namelist variable.

For example

Gmx = 64,64,1  !Base resolution

instead of

Gmx  =  64,  ! cells in x
        64,  ! cells in y
         1,  ! cells in z

Here is the subst.s script

#!/bin/bash
# "subst", a script that substitutes one pattern for
# another in a file,
# i.e., "subst Smith Jones letter.txt".
ARGS=3
E_BADARGS=65   # Wrong number of arguments passed to script.
if [ $# -ne "$ARGS" ]
# Test number of arguments to script (always a good idea).
then
  echo "Usage: `basename $0` variable new-value filename"
  exit $E_BADARGS
 fi

 var_pattern=$1
 new_value=$2

 if [ -f "$3" ]
 then
     file_name=$3
 else
     echo "File \"$3\" does not exist."
     exit $E_BADARGS
 fi

 # Here is where the heavy work gets done.
newfile="$file_name""2"

 sed -e "s/$var_pattern\(\s*=\s*\)\S*/$var_pattern\1$new_value/i" $file_name | uniq > $newfile
mv $newfile $file_name

 # I'm sure there is a better way then to pipe the output through unique but I got frustrated trying JJC
 # 's' is, of course, the substitute command in sed,
 # and /pattern/ invokes address matching.
 # The (\s*=\s*\) matches (and saves) all of the white space on either side of the equals sign
 # The \S* terminates at the first non-white space character which is presumably the end of the variable value
 # Note for array definitions like qTolerance = 1e-3 1e-3 1e-3, this will replace only the first value.
 # This can be avoided by editing your data files to remove white space in array declarations (ie qTolerance = 1e-3,1e-3,1e-3)

 exit 0    # Successful invocation of the script returns 0.

Projecting okc files into curve files for plotting particle positions over projections

Example

The awk line selects only odd numbered lines starting at 31 and prints out the 2nd and 3rd fields. This is for projecting along x. To project along y or z, just change $2, $3 to $3, $1 or $1, $2 Curve and okc files in a database always have to have 1 entry - which in the okc file is 0.0 0.0. The sed just switches these for -100,-100 so they don't appear in the window when projecting.

for i in $(ls sinks_*.okc); do 
  echo "#yz" > $i.x.curve; 
  cat $i | awk 'NR%2==1 && NR > 30  {print $2, $3}' | sed 's/0.0000000000000000E+00 0.0000000000000000E+00/-100 -100/g' >> $i.x.curve; 
done

Then open the .x.curve files in visit as Curve2D and plot the 'yz' data set (don't plot lines - just points)

Note: See TracWiki for help on using the wiki.