Anl Cluster
Authentication/get started
System
thetagpusn1:: Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)
* Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage * Canonical Livepatch is available for installation. - Reduce system reboots and improve kernel security. Activate at: https://ubuntu.com/livepatch
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.
Advertisement
Disk organization
- /dev/sda2 2.0G 938M 913M 51% /boot
 - /dev/sdb1 9.8G 23M 9.2G 1% /var/crash
 - /dev/sda4 126G 6.4G 120G 6% /tmp
 - /dev/sdb2 408G 151G 257G 37% /var/opt/cray/persistent
 - /dev/loop1 21G 21G 0 100% /var/opt/cray/imps-image-binding/PE/squash_mounts/squashfs_sDWXKu_mount_point
 - /dev/loop2 2.5G 2.5G 0 100% /var/opt/cray/imps-image-binding/diags/squash_mounts/squashfs_0S43Gm_mount_point
 - 172.22.10.92@o2ib:172.22.10.93@o2ib:/snx11214 9.2P 8.1P 1.1P 89% /lus/theta-fs0
 - 172.22.12.130@o2ib21:172.22.12.131@o2ib21:/grand 91P 39P 52P 43% /lus/grand
 - 172.22.12.2@o2ib22:172.22.12.3@o2ib22:/eagle 91P 35P 56P 39% /lus/eagle
 
'''*rhino-01-infra.tmi.alcf.anl.gov:/pbs/theta 6.0G 5.1G 956M 85% /pbs
- rhino-01-infra.tmi.alcf.anl.gov:/xalt/theta 434G 345G 89G 80% /var/log/xalt
 - mira-home 1.1P 80T 947T 8% /gpfs/mira-home
 - theta-fs1 7.9P 6.4P 1.6P 81% /gpfs/theta-fs1
 
Links/mounts
- rwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-projects -> /gpfs/theta-fs1/projects
 - lrwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-soft -> /gpfs/theta-fs1/software
 - lrwxrwxrwx 1 root root 19 Apr 6 02:04 grand -> /lus/grand/projects
 - lrwxrwxrwx 1 root root 15 Mar 22 15:51 home -> /gpfs/mira-home
 
Loads
- Tasks: 808 total, 2 running, 806 sleeping, 0 stopped, 0 zombie
 - %Cpu(s): 0.2 us, 0.1 sy, 3.1 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
 - MiB Mem : 257508.0+total, 182783.0+free, 41581.41+used, 33143.53+buff/cache
 - MiB Swap: 131072.0+total, 131072.0+free, 0.000 used. 207617.5+avail Mem
 
who | wc -l - 37 users
Benchmarks (KNL, thetalogin)
- ssh suzunyan@theta.alcf.anl.gov
 
suzunyan@thetalogin6:~> module li Currently Loaded Modulefiles:
1) modules/3.2.11.4 13) dvs/2.12_2.2.172-7.0.2.1_8.1__g7056cbb6 2) intel/19.1.0.166 14) alps/6.6.59-7.0.2.1_3.65__g872a8d62.ari 3) craype-network-aries 15) rca/2.2.20-7.0.2.1_2.78__g8e3fb5b.ari 4) craype/2.6.5 16) atp/3.8.1 5) cray-libsci/20.06.1 17) perftools-base/20.06.0 6) udreg/2.3.2-7.0.2.1_2.33__g8175d3d.ari 18) PrgEnv-intel/6.0.7 7) ugni/6.0.14.0-7.0.2.1_3.60__ge78e5b0.ari 19) craype-mic-knl 8) pmi/5.0.16 20) cray-mpich/7.7.14 9) dmapp/7.1.1-7.0.2.1_2.78__g38cf134.ari 21) nompirun/nompirun 10) gni-headers/5.0.12.0-7.0.2.1_2.19__g3b1768f.ari 22) adaptive-routing-a3 11) xpmem/2.2.20-7.0.2.1_2.60__g87eb960.ari 23) darshan/3.2.1 12) job/2.2.4-7.0.2.1_2.72__g36b56f4.ari 24) xalt
- ln -s /grand/niubmrk/
 - cd niubmrk
 - mkdir modulefiles
 - mkdir archive
 - mkdir conrib
 - mkdir archive/bonnie++
 - cd archive/bonnie++
- wget https://mirrors.sonic.net/epel/7/SRPMS/Packages/b/bonnie++-1.97.3-1.el7.src.rpm
 - module load gcc/9.3.0
 - rpm2cpio bonnie++-1.97.3-1.el7.src.rpm | cpio -i
 - tar -xzf bonnie++-1.97.3.tgz
 - cd bonnie++-1.97.3/
 - ./configure --prefix=/home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
 - make
 - make install
 - ls -l /home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
- total 12
 - drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 bin
 - drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 sbin
 - drwxr-sr-x 3 suzunyan niubmrk 4096 Apr 12 16:35 share
 
 - create module bonnie++/bonnie++-1.97.3-gcc-9.3.0
 - module use /home/suzunyan/niubmrk/modulefiles
 - module load bonnie++/bonnie++-1.97.3-gcc-9.3.0
 
 
GPU jobs
- ssh thetagpusn1
 - Quad core AMD EPYC 7542 32-Core Processor
 - Note: had to create .bash_profile to source .bashrc; this is done "automatically" on the thetalogin
 
Interactive queue submit
suzunyan@thetagpusn1:~$ qsub -I -n 1 -q single-gpu -t 00:05:00 -A niubmrk Job routed to queue "single-gpu". Wait for job 10014682 to start... Opening interactive session to thetagpu05-gpu7 Welcome to NVIDIA DGX Server Version 4.99.9 (GNU/Linux 5.3.0-62-generic x86_64)
System information as of Tue Apr 13 21:54:54 UTC 2021 System load: 0.23 Users logged in: 3 Usage of /: 1.0% of 1.72TB IP address for enp226s0: 10.230.2.193 Memory usage: 2% IP address for infinibond0: 172.23.2.193 Swap usage: 0% IP address for docker0: 172.17.0.1 Processes: 3055
uzunyan@thetagpu05:~$ nvidia-smi Tue Apr 13 21:55:47 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================|
suzunyan@thetagpu05:~$ ls -l /usr/local/cuda lrwxrwxrwx 1 root root 9 Sep 4 2020 /usr/local/cuda -> cuda-11.0
suzunyan@thetagpu05:~$ quad core AMD EPYC 7742 64-Core Processor
suzunyan@thetagpusn1:~$ module av
/home/suzunyan/niubmrk/modulefiles ---------------------------------------------------
bonnie++/bonnie++-1.97.3-gcc-9.3.0
/usr/local/lmod/lmod/modulefiles ----------------------------------------------------
Core/lmod Core/settarg
/lus/theta-fs0/software/environment/thetagpu/lmod/modulefiles --------------------------------------
Core/StdEnv (L,D) conda/tensorflow/2020-11-11 conda/tensorflow/2021-03-02 (D) openmpi/openmpi-4.1.0 (D) aocl/blis-3.0 conda/tensorflow/2020-12-17 nccl/nccl-v2.8.4-1_CUDA11 conda/pytorch/2020-11-25 conda/tensorflow/2020-12-23 openmpi/openmpi-4.0.5 (L) conda/pytorch/2021-03-02 (D) conda/tensorflow/2021-01-08 openmpi/openmpi-4.1.0_ucx-1.10.0
/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-x86_64 -------------------------------
autoconf-2.69-gcc-7.5.0-wmttzuv gmp-6.1.2-gcc-7.5.0-3ol3tld ncurses-6.2-gcc-7.5.0-crhlefo autoconf-archive-2019.01.06-gcc-7.5.0-bdyarrk libiconv-1.16-gcc-7.5.0-jearpk4 perl-5.32.1-gcc-7.5.0-op6xocu automake-1.16.3-gcc-7.5.0-stmktof libsigsegv-2.12-gcc-7.5.0-lbrx7ln pkgconf-1.7.3-gcc-7.5.0-4sh6pym berkeley-db-18.1.40-gcc-7.5.0-vd7vwr5 libtool-2.4.6-gcc-7.5.0-jdxbjft readline-8.0-gcc-7.5.0-t54jzdy diffutils-3.7-gcc-7.5.0-otkkten m4-1.4.18-gcc-7.5.0-mkc3u4x zlib-1.2.11-gcc-7.5.0-smoyzzo gcc-10.2.0-gcc-7.5.0-jj2fh4j mpc-1.1.0-gcc-7.5.0-pj4yncj zstd-1.4.5-gcc-7.5.0-rnf7xyj gdbm-1.18.1-gcc-7.5.0-4av4gyw mpfr-4.0.2-gcc-7.5.0-mpv2v7v
/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-zen2 --------------------------------
nvhpc-21.2-gcc-10.2.0-qjwyhvd
Build ROOT with tmva
- mkdir -p ~/niubmrk/build/root/root-6.22.08-build
 - cd ~/niubmrk/build/root/root-6.22.08-build
 - cmake3 /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Dpythia8=ON -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0
 - cmake3 --build . -- -j8
 - cmake3 --build . --target install
 - cmake /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0 -DPYTHON_EXECUTABLE=/usr/bin/python3
 - CMake Error at cmake/modules/SearchInstalledSoftware.cmake:320 (message):
 - libXpm and Xpm headers must be installed.
 - Call Stack (most recent call first):
 - CMakeLists.txt:192 (include)