CFOUR | Main / Doing Grid Calculations In Parallel

If one wants to generate a grid of energy points (perhaps to be fit to a potential energy surface, plotted in some way, etc.) there is a provision for doing such calculations in parallel. This section of the manual documents the required keywords as well as a general strategy (complete with shell scripts) for doing this in an efficient way.

First, after the grid is identified and the appropriate ZMAT file is constructed, add the keyword GRID_ALGORITHM=PARALLEL. Then, simply run the xjoda executable, and a number of files (zmatxxxxx, where xxxxx goes from 00001 to the number of points in your grid), will be generated. These contain the coordinates for each point in the grid calculation and - of course - can be run independently.

One easy way to submit all the jobs and collect the information that is needed for processing the results is to use a run.dummy file of the sort below:

run.dummy script:

#$ -S /bin/csh
#$ -N job.xxxxx
#PBS -l nodes=1
#$ -l vf=500M
set codedir=(~stanton/git/aces2)
set basisdir=(~stanton/git/aces2/basis)
set workdir=(~stanton/n2co/excite/ccsdt/anharm/play)
set tmpdir=(/scr/stanton/job.xxxxx)
set path=(~/git/aces2/bin $path)
mkdir $tmpdir
cd $tmpdir
rm *
cp $workdir/zmatxxxxx ZMAT
cp $basisdir/GENBAS GENBAS
xaces2 > $workdir/out.xxxxx
xja2fja
cp FJOBARC $workdir/fja.xxxxx
cd ..
rm -R $tmpdir

Note that the lines in blue are specific to your particular queuing system; those in red depend on how and where you have built the program (which probably will rarely change), the directory where you have the ZMAT file with the optimized geometry (which will probably always change). It is just the black lines (and black text in the otherwise red line) that is/are important, since this script serves as a template for the script below:

Now, use the following simple script to convert your run.dummy file into appropriate scripts for running all of the calculations:

sub script:

#!/bin/sh
for a in $(ls zmat*)
do
sed s/xxxxx/${a:4}/g < run.dummy > run${a:4}
done

Now, simply submit the "runxxxxx" scripts to your queueing system, relax for a while, have a coffee (if the calculations are quick), or perhaps take a vacation (if they each take a long time). At some point in the not too distant future, the calculations will all be done, and corresponding fja.xxxxx files will be present in the directory that holds the input files and various runscripts.
To process the fja.xxxxx files and produce a grid.log file that is identical to what would have been produced by a serial (single submission) grid calculation job, run the script

pgrid_proc script:

i=good
for a in $(ls zmat*)
do
if test -f fja.${a:4}
then
continue
else
echo "Problem: Point number ${a:4} did not complete successfully!"
i=bad
fi
done
if test $i == "bad"
then
echo "Script terminates. No grid.log file produced."
else
echo "All fja files are present."
echo "Generating grid.log file"
xclean
rm grid.*
xjoda >& /dev/null
for a in $(ls zmat*)
do
cp fja.${a:4} FJOBARC
xja2fja >& /dev/null
xjoda >& /dev/null
done
echo "grid.log file generated and in place"
fi

At this point, there will be a grid.log file in the directory with the energies computed at each data point.

Of course, it is also possible to generate lots of things on a grid: dipole moments, coupling constants, vibrational frequencies, etc. Such calculations can certainly be done with a procedure similar to that outlined here, but will require modification of the xja2fja executable module; advice for doing so can be obtained from the developers.

Alternative pgrid_proc.sh which appears to be significantly faster:

#!/bin/bash

lim=`ls zmat* |grep -v sh| tail -n 1 | sed s/[a-z]//g`
count=1;
first=`ls zmat* | head -n 1 | sed s/[a-z]//g`

# determine some numbers
ncoord=`grep %grid ZMAT -A1 | tail -n1`
natom=`grep atoms out.$first | awk '{print $2}'`

for i in `seq -w 1 1 $lim`;do
if [ -e out.$i ]; then
E=`grep final out.$i | tail -n 1 | awk '{print $6}'`
#E=`tail -n 2 out.$i | head -n 1 | awk '{print $6}'`
if [ -z $E ];then
echo "Point $i did not complete sucessfully! Check it and try again."
fi
fi
if [ ! -z $E ]; then
#head -n $count grid.xyz | tail -n 1 | awk 'ORS=" "{ for (i=2;i<=NF;i+=1) print $i }'
sed -n ''"$count"'{p;q}' grid.xyz | awk 'ORS=" "{ for (i=2;i<=NF;i+=1) print $i }'
echo $E
fi
count=`expr $count + $natom + 1`
done

Script for an arbitrary grid in internal coordinates. Generates zmat files for an arbitrary list of geometries.
Input a list of geometries with coordinate names on the first line.
(Caution - avoid name conflicts between coordinates and CFOUR keywords)

#!/bin/bash

# reads a file, and generates a zmat at every geometry
# Needs a template ZMAT file in the same directory

while getopts ":r:h" opt; do
case $opt in
h)
echo "Usage: -r list-of-points-file \
, where the file begins with a coordinate list."
exit
;;
r)
rlog=`cat $OPTARG`
;;
\?)
echo "Invalid option: -$OPTARG" >&2
;;
esac
done
if [ $OPTIND -eq 1 ]; then echo "No options were passed. Try -h first."; exit; fi
# get coords and #, dump header and get # of jobs
clist=`echo "$rlog" | head -n 1`
nc=`echo $clist | awk "{ print NF }"` \\ rlog=`echo "$rlog" | tail -n +2`
num=`echo "$rlog" | wc -l`

for i in `seq -w 1 1 $num`; do
geom=`echo "$rlog" | sed -n ''"$i"'{p;q}'`
for j in `seq 1 1 $nc`; do
val=`echo $geom | awk '{print $'"$j"'}'`
crd=`echo $clist | awk '{print $'"$j"'}'`
/usr/bin/awk '/^'"$crd"'/{gsub($3, "'"$val"'")}; {print}' ZMAT > zmat.tmp # change coord., write to zmat
mv zmat.tmp ZMAT
done
cp ZMAT zmat$i
if [ -e run.dummy ]; then
sed s/00001/$i/g < run.dummy > run$i
fi
done \\