HTCONDOR batch system
Please read the CONDOR user manual , specifically the "submitting a job" and "managing a job" chapters. The important commands are condor_status, condor_q, condor_submit, condor_hold, and condor_rm.
The first job
CONDOR works with a special "job description file" or "condor command file" where user requests a set of resources required for for a job. The job description file defines the type of the job (single core, parallel, e.t.c ) and can be used to pass parameters to a job and to manage transfer of input and output files.
In this example we will submit a user's script condorJob1.sh using job description file condorJob1.sh.
- login to one of the cluster interactive nodes (cms1, hpcm ot t3int0)
- create a batch job project area :
- mkdir -p /xdata/$USER/batch
- cd /xdata/$USER/batch
- Check the list of available nodes
- condor_status
Machines Owner Claimed Unclaimed Matched Preempting SL6x64/LINUX 366 0 0 366 0 0 SL7x64/LINUX 216 0 0 216 0 0 Total 582 0 0 582 0 0
- copy condorJob1.cmd and .condorJob1.sh tp /xdata/$USER/batch
- inspect the condor command file. We require condorJob1.sh to be submitted on any of SL7x64 nodes in condor "vanilla" universe ( a single core job); define where to write job error, log, and out files (produced by CONDOR system during run); request to transfer all files produced by the condorJob1.sh back to the current directory and require e-mail notification at the end of the job. Do not forget uncomment "notify_user" line and provide working e-mal address.
- cat condorJob1.cmd
############################################################################# ##Condor command file example ############################## # requirements = (Arch == "SL7x64") # # executable = $ENV(PWD)/condorJob1.sh # output = $ENV(PWD)/condorJob1.out error = $ENV(PWD)/condorJob1.err log = $ENV(PWD)/condorJob1.log # universe = vanilla should_transfer_files = YES when_to_transfer_output = ON_EXIT # #Uncomment and provide a working e-mail address #notify_user = user@gmail.com notification = always queue # #############################################################################
- inspect the condorJob1.sh script (it only says "Hello", prints environment variables defined at the remote node at time of the run, sleeps 3 min and creates the two toy output files):
- cat condorJob1.sh
#A condor job script example UNAME="$(id -nu)" SCRIPTNAME="condorJob1" JOBID="$(echo $_CONDOR_SCRATCH_DIR | sed 's/\// /g'| sed 's/\_/ /g' | awk '{print $NF}')" DATE="$(date | sed 's/:/ /g' | awk '{print $2$3"_"$4_$5_$6}')" JOBNAME="${SCRIPTNAME}_${JOBID}_${DATE}" ### # Here the placeholder - just to have some output ## echo "Greetings $UNAME! (from CONDOR on node $HOSTNAME at $DATE)" echo "Job environment" env | sort #Pretend of doing something for 3 mins sleep 180 # #Create some files echo "Here can be a job output data" >> ${JOBNAME}.root echo "Here can be a job logging info" >> ${JOBNAME}.info #
- submit condor job
- condor_submit condorJob1.cmd
- check job status
- condor_q -global
-Schedd: hpcm.nicadd.niu.edu : <192.168.100.2:9248?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 user 9/8 23:57 0+00:00:19 R 0 0.0 condorJob1.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
- when job ends (wait 3 min, check e-mail from the CONDOR system) inspect the job output files
- ls -l condorJob1*
- Read CONDOR manual for more info about Submitting a Job