HTCONDOR batch system

Please read the CONDOR user manual , specifically the "submitting a job" and "managing a job" chapters. The important commands are condor_status, condor_q, condor_submit, condor_hold, and condor_rm.

The first job

CONDOR works with a special "job description file" or "condor command file" where user requests a set of resources required for for a job. The job description file defines the type of the job (single core, parallel, e.t.c ) and can be used to pass parameters to a job and to manage transfer of input and output files.

In this example we will submit a user's script condorJob1.sh using job description file condorJob1.sh.

  • login to one of the cluster interactive nodes (cms1, hpcm ot t3int0)
  • create a batch job project area :
    • mkdir -p /xdata/$USER/batch
    • cd /xdata/$USER/batch
  • Check the list of available nodes
    • condor_status
                      Machines Owner Claimed Unclaimed Matched Preempting

        SL6x64/LINUX      366     0       0       366       0          0
        SL7x64/LINUX      216     0       0       216       0          0

               Total      582     0       0       582       0          0
  • copy condorJob1.cmd and .condorJob1.sh tp /xdata/$USER/batch
  • inspect the condor command file. We require condorJob1.sh to be submitted on any of SL7x64 nodes in condor "vanilla" universe ( a single core job); define where to write job error, log, and out files (produced by CONDOR system during run); request to transfer all files produced by the condorJob1.sh back to the current directory and require e-mail notification at the end of the job. Do not forget uncomment "notify_user" line and provide working e-mal address.
    • cat condorJob1.cmd
 #############################################################################                
 ##Condor command file example                                                                
 ##############################                                                                 
 #                                                                                            
 requirements              = (Arch == "SL7x64")                                               
 #                                                                                            
 #                                                                                             
 executable                = $ENV(PWD)/condorJob1.sh                                                        
 #                                                                                                          
 output                    = $ENV(PWD)/condorJob1.out                                                       
 error                     = $ENV(PWD)/condorJob1.err                                                       
 log                       = $ENV(PWD)/condorJob1.log
 #
 universe                  = vanilla
 should_transfer_files     = YES
 when_to_transfer_output   = ON_EXIT
 #
 #Uncomment and provide a working e-mail address
 #notify_user              = user@gmail.com
 notification              = always
 queue
 #
 #############################################################################
  • inspect the condorJob1.sh script (it only says "Hello", prints environment variables defined at the remote node at time of the run, sleeps 3 min and creates the two toy output files):
    • cat condorJob1.sh
 #A condor job script example 
 UNAME="$(id -nu)"
 SCRIPTNAME="condorJob1"
 JOBID="$(echo $_CONDOR_SCRATCH_DIR | sed 's/\// /g'| sed 's/\_/ /g' | awk '{print $NF}')"
 DATE="$(date | sed 's/:/ /g' | awk '{print $2$3"_"$4_$5_$6}')"
 JOBNAME="${SCRIPTNAME}_${JOBID}_${DATE}"
 ###
 # Here the placeholder - just to have some output 
 ##
 echo "Greetings $UNAME! (from CONDOR on node $HOSTNAME at $DATE)"
 echo "Job environment"
 env | sort
 #Pretend of doing something for 3 mins
 sleep 180
 #
 #Create some files
 echo "Here can be a job output  data"   >> ${JOBNAME}.root 
 echo "Here can be a job logging info"   >> ${JOBNAME}.info
 #
  • submit condor job
    • condor_submit condorJob1.cmd
  • check job status
    • condor_q -global

-Schedd: hpcm.nicadd.niu.edu : <192.168.100.2:9248?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 user 9/8 23:57 0+00:00:19 R 0 0.0 condorJob1.sh

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

  • when job ends (wait 3 min, check e-mail from the CONDOR system) inspect the job output files
    • ls -l condorJob1*
  • Read CONDOR manual for more info about Submitting a Job

!! All-in-one vanilla job
!! Parallel (MPI) job


Page last modified on September 09, 2018, at 12:15 AM EST

|--Created: Sergey A. Uzunyan -|- Site Config --|