Use 'srun' and not 'salloc' on hansen/shiller
Created by: bartlettroscoe
CC: @fryeguy52
Description
The commits and the updated text in the README.md file are self-explanatory but in short you need to allocate jobs on 'hansen' with srun
and not salloc
.
Motivation and Context
If you use salloc
, SLURM runs the jobs on the login node 'hansen01' but thinks it is running it on a compute node. If you use srun
, it runs the job correctly on a compute node. Crazy.
This is what was likely causing all of the test timeouts and strange behavior that we have been seeing lately on 'hansen' documented in issues #2925, #2919 and #2913 (closed).
How Has This Been Tested?
On 'hansen' I ran:
$ time env \
JOB_NAME=Trilinos-atdm-hansen-shiller-gnu-debug-serial \
WORKSPACE=$PWD \
Trilinos_PACKAGES=Kokkos,Teuchos \
CTEST_TEST_TYPE=Experimental \
CTEST_DO_SUBMIT=OFF \
CTEST_DO_UPDATES=OFF \
CTEST_START_WITH_EMPTY_BINARY_DIRECTORY=FALSE \
~/Trilinos.base/Trilinos/cmake/ctest/drivers/atdm/smart-jenkins-driver.sh \
&> console.out
real 66m37.153s
user 0m0.102s
sys 0m0.114s
The console.out
file showed:
+ /usr/bin/srun -N 1 --constraint=k80 -J Trilinos-atdm-hansen-shiller-gnu-debug-serial --time=180 /ascldap/users/rabartl/Trilinos.base/BUILD/HANSEN/JENKINS_DRIVER/Trilinos/cmake/ctest/drivers/atdm/ctest-s-driver.sh
srun: job 1003128 queued and waiting for resources
srun: job 1003128 has been allocated resources
Starting processing .bash_profile
Skipping load of the dev env because /ascldap/users/rabartl/load_dev_env.sh does not exist!
Ending processing .bash_profile
Start: ctest-s-driver.sh
==> Wed Jun 13 18:21:21 MDT 2018
Loading env and running ctest -S comamnd to configure, build, and test ...
Current dir: /ascldap/users/rabartl/Trilinos.base/BUILD/HANSEN/JENKINS_DRIVER/SRC_AND_BUILD
Hostname 'hansen04' matches known ATDM host 'hansen' and system 'shiller'
ATDM_CONFIG_TRILNOS_DIR = /home/rabartl/Trilinos.base/Trilinos
Setting default compiler and build options for JOB_NAME='Trilinos-atdm-hansen-shiller-gnu-debug-serial'
Using hansen/shiller compiler stack GNU to build DEBUG code with Kokkos node type SERIAL
See, it showed:
Hostname 'hansen04' matches known ATDM host 'hansen' and system 'shiller'
That shows that it actually running on the compute node.
I also verified looking in the file /SRC_AND_BUILD/BUILD/Testing/Temporary/LastConfigure_20180614-0021.log
that it shows:
-- Trilinos_HOSTNAME='hansen04'