Anl Cluster

Authentication/get started

ANL cluster - ALCF

System

thetagpusn1:: Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

 * Canonical Livepatch is available for installation.
   - Reduce system reboots and improve kernel security. Activate at:
     https://ubuntu.com/livepatch

Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings

The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.

Advertisement

2019 report

Disk organization

  • /dev/sda2 2.0G 938M 913M 51% /boot
  • /dev/sdb1 9.8G 23M 9.2G 1% /var/crash
  • /dev/sda4 126G 6.4G 120G 6% /tmp
  • /dev/sdb2 408G 151G 257G 37% /var/opt/cray/persistent
  • /dev/loop1 21G 21G 0 100% /var/opt/cray/imps-image-binding/PE/squash_mounts/squashfs_sDWXKu_mount_point
  • /dev/loop2 2.5G 2.5G 0 100% /var/opt/cray/imps-image-binding/diags/squash_mounts/squashfs_0S43Gm_mount_point
  • 172.22.10.92@o2ib:172.22.10.93@o2ib:/snx11214 9.2P 8.1P 1.1P 89% /lus/theta-fs0
  • 172.22.12.130@o2ib21:172.22.12.131@o2ib21:/grand 91P 39P 52P 43% /lus/grand
  • 172.22.12.2@o2ib22:172.22.12.3@o2ib22:/eagle 91P 35P 56P 39% /lus/eagle

'''*rhino-01-infra.tmi.alcf.anl.gov:/pbs/theta 6.0G 5.1G 956M 85% /pbs

  • rhino-01-infra.tmi.alcf.anl.gov:/xalt/theta 434G 345G 89G 80% /var/log/xalt
  • mira-home 1.1P 80T 947T 8% /gpfs/mira-home
  • theta-fs1 7.9P 6.4P 1.6P 81% /gpfs/theta-fs1

Links/mounts

  • rwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-projects -> /gpfs/theta-fs1/projects
  • lrwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-soft -> /gpfs/theta-fs1/software
  • lrwxrwxrwx 1 root root 19 Apr 6 02:04 grand -> /lus/grand/projects
  • lrwxrwxrwx 1 root root 15 Mar 22 15:51 home -> /gpfs/mira-home

Loads

  • Tasks: 808 total, 2 running, 806 sleeping, 0 stopped, 0 zombie
  • %Cpu(s): 0.2 us, 0.1 sy, 3.1 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
  • MiB Mem : 257508.0+total, 182783.0+free, 41581.41+used, 33143.53+buff/cache
  • MiB Swap: 131072.0+total, 131072.0+free, 0.000 used. 207617.5+avail Mem

who | wc -l - 37 users

Benchmarks (KNL, thetalogin)

  • ssh suzunyan@theta.alcf.anl.gov

suzunyan@thetalogin6:~> module li Currently Loaded Modulefiles:

  1) modules/3.2.11.4                                 13) dvs/2.12_2.2.172-7.0.2.1_8.1__g7056cbb6
  2) intel/19.1.0.166                                 14) alps/6.6.59-7.0.2.1_3.65__g872a8d62.ari
  3) craype-network-aries                             15) rca/2.2.20-7.0.2.1_2.78__g8e3fb5b.ari
  4) craype/2.6.5                                     16) atp/3.8.1
  5) cray-libsci/20.06.1                              17) perftools-base/20.06.0
  6) udreg/2.3.2-7.0.2.1_2.33__g8175d3d.ari           18) PrgEnv-intel/6.0.7
  7) ugni/6.0.14.0-7.0.2.1_3.60__ge78e5b0.ari         19) craype-mic-knl
  8) pmi/5.0.16                                       20) cray-mpich/7.7.14
  9) dmapp/7.1.1-7.0.2.1_2.78__g38cf134.ari           21) nompirun/nompirun
 10) gni-headers/5.0.12.0-7.0.2.1_2.19__g3b1768f.ari  22) adaptive-routing-a3
 11) xpmem/2.2.20-7.0.2.1_2.60__g87eb960.ari          23) darshan/3.2.1
 12) job/2.2.4-7.0.2.1_2.72__g36b56f4.ari             24) xalt
  • ln -s /grand/niubmrk/
  • cd niubmrk
  • mkdir modulefiles
  • mkdir archive
  • mkdir conrib
  • mkdir archive/bonnie++
  • cd archive/bonnie++
    • wget https://mirrors.sonic.net/epel/7/SRPMS/Packages/b/bonnie++-1.97.3-1.el7.src.rpm
    • module load gcc/9.3.0
    • rpm2cpio bonnie++-1.97.3-1.el7.src.rpm | cpio -i
    • tar -xzf bonnie++-1.97.3.tgz
    • cd bonnie++-1.97.3/
    • ./configure --prefix=/home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
    • make
    • make install
    • ls -l /home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
      • total 12
      • drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 bin
      • drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 sbin
      • drwxr-sr-x 3 suzunyan niubmrk 4096 Apr 12 16:35 share
    • create module bonnie++/bonnie++-1.97.3-gcc-9.3.0
    • module use /home/suzunyan/niubmrk/modulefiles
    • module load bonnie++/bonnie++-1.97.3-gcc-9.3.0

GPU jobs

  • ssh thetagpusn1
  • Quad core AMD EPYC 7542 32-Core Processor
  • Note: had to create .bash_profile to source .bashrc; this is done "automatically" on the thetalogin

Interactive queue submit

suzunyan@thetagpusn1:~$ qsub -I -n 1 -q single-gpu -t 00:05:00 -A niubmrk Job routed to queue "single-gpu". Wait for job 10014682 to start... Opening interactive session to thetagpu05-gpu7 Welcome to NVIDIA DGX Server Version 4.99.9 (GNU/Linux 5.3.0-62-generic x86_64)

  System information as of Tue Apr 13 21:54:54 UTC 2021

  System load:  0.23             Users logged in:            3
  Usage of /:   1.0% of 1.72TB   IP address for enp226s0:    10.230.2.193
  Memory usage: 2%               IP address for infinibond0: 172.23.2.193
  Swap usage:   0%               IP address for docker0:     172.17.0.1
  Processes:    3055

uzunyan@thetagpu05:~$ nvidia-smi Tue Apr 13 21:55:47 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================|

suzunyan@thetagpu05:~$ ls -l /usr/local/cuda lrwxrwxrwx 1 root root 9 Sep 4 2020 /usr/local/cuda -> cuda-11.0

suzunyan@thetagpu05:~$ quad core AMD EPYC 7742 64-Core Processor

suzunyan@thetagpusn1:~$ module av


/home/suzunyan/niubmrk/modulefiles ---------------------------------------------------
   bonnie++/bonnie++-1.97.3-gcc-9.3.0

/usr/local/lmod/lmod/modulefiles ----------------------------------------------------
   Core/lmod    Core/settarg

/lus/theta-fs0/software/environment/thetagpu/lmod/modulefiles --------------------------------------
   Core/StdEnv              (L,D)    conda/tensorflow/2020-11-11    conda/tensorflow/2021-03-02      (D)    openmpi/openmpi-4.1.0 (D)
   aocl/blis-3.0                     conda/tensorflow/2020-12-17    nccl/nccl-v2.8.4-1_CUDA11
   conda/pytorch/2020-11-25          conda/tensorflow/2020-12-23    openmpi/openmpi-4.0.5            (L)
   conda/pytorch/2021-03-02 (D)      conda/tensorflow/2021-01-08    openmpi/openmpi-4.1.0_ucx-1.10.0

/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-x86_64 -------------------------------
   autoconf-2.69-gcc-7.5.0-wmttzuv                  gmp-6.1.2-gcc-7.5.0-3ol3tld          ncurses-6.2-gcc-7.5.0-crhlefo
   autoconf-archive-2019.01.06-gcc-7.5.0-bdyarrk    libiconv-1.16-gcc-7.5.0-jearpk4      perl-5.32.1-gcc-7.5.0-op6xocu
   automake-1.16.3-gcc-7.5.0-stmktof                libsigsegv-2.12-gcc-7.5.0-lbrx7ln    pkgconf-1.7.3-gcc-7.5.0-4sh6pym
   berkeley-db-18.1.40-gcc-7.5.0-vd7vwr5            libtool-2.4.6-gcc-7.5.0-jdxbjft      readline-8.0-gcc-7.5.0-t54jzdy
   diffutils-3.7-gcc-7.5.0-otkkten                  m4-1.4.18-gcc-7.5.0-mkc3u4x          zlib-1.2.11-gcc-7.5.0-smoyzzo
   gcc-10.2.0-gcc-7.5.0-jj2fh4j                     mpc-1.1.0-gcc-7.5.0-pj4yncj          zstd-1.4.5-gcc-7.5.0-rnf7xyj
   gdbm-1.18.1-gcc-7.5.0-4av4gyw                    mpfr-4.0.2-gcc-7.5.0-mpv2v7v

/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-zen2 --------------------------------
   nvhpc-21.2-gcc-10.2.0-qjwyhvd

Build ROOT with tmva

  • mkdir -p ~/niubmrk/build/root/root-6.22.08-build
  • cd ~/niubmrk/build/root/root-6.22.08-build
  • cmake3 /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Dpythia8=ON -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0
  • cmake3 --build . -- -j8
  • cmake3 --build . --target install
  • cmake /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0 -DPYTHON_EXECUTABLE=/usr/bin/python3
  • CMake Error at cmake/modules/SearchInstalledSoftware.cmake:320 (message):
  • libXpm and Xpm headers must be installed.
  • Call Stack (most recent call first):
  • CMakeLists.txt:192 (include)


Page last modified on April 20, 2021, at 11:31 PM EST

|--Created: Sergey A. Uzunyan -|- Site Config --|