Anl Cluster
Authentication/get started
System
thetagpusn1:: Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)
* Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage * Canonical Livepatch is available for installation. - Reduce system reboots and improve kernel security. Activate at: https://ubuntu.com/livepatch
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.
Advertisement
Disk organization
- /dev/sda2 2.0G 938M 913M 51% /boot
- /dev/sdb1 9.8G 23M 9.2G 1% /var/crash
- /dev/sda4 126G 6.4G 120G 6% /tmp
- /dev/sdb2 408G 151G 257G 37% /var/opt/cray/persistent
- /dev/loop1 21G 21G 0 100% /var/opt/cray/imps-image-binding/PE/squash_mounts/squashfs_sDWXKu_mount_point
- /dev/loop2 2.5G 2.5G 0 100% /var/opt/cray/imps-image-binding/diags/squash_mounts/squashfs_0S43Gm_mount_point
- 172.22.10.92@o2ib:172.22.10.93@o2ib:/snx11214 9.2P 8.1P 1.1P 89% /lus/theta-fs0
- 172.22.12.130@o2ib21:172.22.12.131@o2ib21:/grand 91P 39P 52P 43% /lus/grand
- 172.22.12.2@o2ib22:172.22.12.3@o2ib22:/eagle 91P 35P 56P 39% /lus/eagle
'''*rhino-01-infra.tmi.alcf.anl.gov:/pbs/theta 6.0G 5.1G 956M 85% /pbs
- rhino-01-infra.tmi.alcf.anl.gov:/xalt/theta 434G 345G 89G 80% /var/log/xalt
- mira-home 1.1P 80T 947T 8% /gpfs/mira-home
- theta-fs1 7.9P 6.4P 1.6P 81% /gpfs/theta-fs1
Links/mounts
- rwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-projects -> /gpfs/theta-fs1/projects
- lrwxrwxrwx 1 root root 24 Aug 23 2019 gpfs-soft -> /gpfs/theta-fs1/software
- lrwxrwxrwx 1 root root 19 Apr 6 02:04 grand -> /lus/grand/projects
- lrwxrwxrwx 1 root root 15 Mar 22 15:51 home -> /gpfs/mira-home
Loads
- Tasks: 808 total, 2 running, 806 sleeping, 0 stopped, 0 zombie
- %Cpu(s): 0.2 us, 0.1 sy, 3.1 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
- MiB Mem : 257508.0+total, 182783.0+free, 41581.41+used, 33143.53+buff/cache
- MiB Swap: 131072.0+total, 131072.0+free, 0.000 used. 207617.5+avail Mem
who | wc -l - 37 users
Benchmarks (KNL, thetalogin)
- ssh suzunyan@theta.alcf.anl.gov
suzunyan@thetalogin6:~> module li Currently Loaded Modulefiles:
1) modules/3.2.11.4 13) dvs/2.12_2.2.172-7.0.2.1_8.1__g7056cbb6 2) intel/19.1.0.166 14) alps/6.6.59-7.0.2.1_3.65__g872a8d62.ari 3) craype-network-aries 15) rca/2.2.20-7.0.2.1_2.78__g8e3fb5b.ari 4) craype/2.6.5 16) atp/3.8.1 5) cray-libsci/20.06.1 17) perftools-base/20.06.0 6) udreg/2.3.2-7.0.2.1_2.33__g8175d3d.ari 18) PrgEnv-intel/6.0.7 7) ugni/6.0.14.0-7.0.2.1_3.60__ge78e5b0.ari 19) craype-mic-knl 8) pmi/5.0.16 20) cray-mpich/7.7.14 9) dmapp/7.1.1-7.0.2.1_2.78__g38cf134.ari 21) nompirun/nompirun 10) gni-headers/5.0.12.0-7.0.2.1_2.19__g3b1768f.ari 22) adaptive-routing-a3 11) xpmem/2.2.20-7.0.2.1_2.60__g87eb960.ari 23) darshan/3.2.1 12) job/2.2.4-7.0.2.1_2.72__g36b56f4.ari 24) xalt
- ln -s /grand/niubmrk/
- cd niubmrk
- mkdir modulefiles
- mkdir archive
- mkdir conrib
- mkdir archive/bonnie++
- cd archive/bonnie++
- wget https://mirrors.sonic.net/epel/7/SRPMS/Packages/b/bonnie++-1.97.3-1.el7.src.rpm
- module load gcc/9.3.0
- rpm2cpio bonnie++-1.97.3-1.el7.src.rpm | cpio -i
- tar -xzf bonnie++-1.97.3.tgz
- cd bonnie++-1.97.3/
- ./configure --prefix=/home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
- make
- make install
- ls -l /home/suzunyan/niubmrk/contrib/bonnie/bonnie++-1.97.3
- total 12
- drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 bin
- drwxr-sr-x 2 suzunyan niubmrk 4096 Apr 12 16:35 sbin
- drwxr-sr-x 3 suzunyan niubmrk 4096 Apr 12 16:35 share
- create module bonnie++/bonnie++-1.97.3-gcc-9.3.0
- module use /home/suzunyan/niubmrk/modulefiles
- module load bonnie++/bonnie++-1.97.3-gcc-9.3.0
GPU jobs
- ssh thetagpusn1
- Quad core AMD EPYC 7542 32-Core Processor
- Note: had to create .bash_profile to source .bashrc; this is done "automatically" on the thetalogin
Interactive queue submit
suzunyan@thetagpusn1:~$ qsub -I -n 1 -q single-gpu -t 00:05:00 -A niubmrk Job routed to queue "single-gpu". Wait for job 10014682 to start... Opening interactive session to thetagpu05-gpu7 Welcome to NVIDIA DGX Server Version 4.99.9 (GNU/Linux 5.3.0-62-generic x86_64)
System information as of Tue Apr 13 21:54:54 UTC 2021 System load: 0.23 Users logged in: 3 Usage of /: 1.0% of 1.72TB IP address for enp226s0: 10.230.2.193 Memory usage: 2% IP address for infinibond0: 172.23.2.193 Swap usage: 0% IP address for docker0: 172.17.0.1 Processes: 3055
uzunyan@thetagpu05:~$ nvidia-smi Tue Apr 13 21:55:47 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================|
suzunyan@thetagpu05:~$ ls -l /usr/local/cuda lrwxrwxrwx 1 root root 9 Sep 4 2020 /usr/local/cuda -> cuda-11.0
suzunyan@thetagpu05:~$ quad core AMD EPYC 7742 64-Core Processor
suzunyan@thetagpusn1:~$ module av
/home/suzunyan/niubmrk/modulefiles ---------------------------------------------------
bonnie++/bonnie++-1.97.3-gcc-9.3.0
/usr/local/lmod/lmod/modulefiles ----------------------------------------------------
Core/lmod Core/settarg
/lus/theta-fs0/software/environment/thetagpu/lmod/modulefiles --------------------------------------
Core/StdEnv (L,D) conda/tensorflow/2020-11-11 conda/tensorflow/2021-03-02 (D) openmpi/openmpi-4.1.0 (D) aocl/blis-3.0 conda/tensorflow/2020-12-17 nccl/nccl-v2.8.4-1_CUDA11 conda/pytorch/2020-11-25 conda/tensorflow/2020-12-23 openmpi/openmpi-4.0.5 (L) conda/pytorch/2021-03-02 (D) conda/tensorflow/2021-01-08 openmpi/openmpi-4.1.0_ucx-1.10.0
/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-x86_64 -------------------------------
autoconf-2.69-gcc-7.5.0-wmttzuv gmp-6.1.2-gcc-7.5.0-3ol3tld ncurses-6.2-gcc-7.5.0-crhlefo autoconf-archive-2019.01.06-gcc-7.5.0-bdyarrk libiconv-1.16-gcc-7.5.0-jearpk4 perl-5.32.1-gcc-7.5.0-op6xocu automake-1.16.3-gcc-7.5.0-stmktof libsigsegv-2.12-gcc-7.5.0-lbrx7ln pkgconf-1.7.3-gcc-7.5.0-4sh6pym berkeley-db-18.1.40-gcc-7.5.0-vd7vwr5 libtool-2.4.6-gcc-7.5.0-jdxbjft readline-8.0-gcc-7.5.0-t54jzdy diffutils-3.7-gcc-7.5.0-otkkten m4-1.4.18-gcc-7.5.0-mkc3u4x zlib-1.2.11-gcc-7.5.0-smoyzzo gcc-10.2.0-gcc-7.5.0-jj2fh4j mpc-1.1.0-gcc-7.5.0-pj4yncj zstd-1.4.5-gcc-7.5.0-rnf7xyj gdbm-1.18.1-gcc-7.5.0-4av4gyw mpfr-4.0.2-gcc-7.5.0-mpv2v7v
/lus/theta-fs0/software/spack/share/spack/modules/linux-ubuntu18.04-zen2 --------------------------------
nvhpc-21.2-gcc-10.2.0-qjwyhvd
Build ROOT with tmva
- mkdir -p ~/niubmrk/build/root/root-6.22.08-build
- cd ~/niubmrk/build/root/root-6.22.08-build
- cmake3 /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Dpythia8=ON -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0
- cmake3 --build . -- -j8
- cmake3 --build . --target install
- cmake /grand/niubmrk/archive/root/root-6.22.08 -DCMAKE_INSTALL_PREFIX=/grand/niubmrk/contrib/root/root-6.22.08-cuda-11.0 -Dbuiltin_xrootd=ON -Dtmva=ON -Dimt=true -Droofit=ON -Dcuda=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.0 -DPYTHON_EXECUTABLE=/usr/bin/python3
- CMake Error at cmake/modules/SearchInstalledSoftware.cmake:320 (message):
- libXpm and Xpm headers must be installed.
- Call Stack (most recent call first):
- CMakeLists.txt:192 (include)