Skip to content

Use 'srun' and not 'salloc' on hansen/shiller

Created by: bartlettroscoe

CC: @fryeguy52

Description

The commits and the updated text in the README.md file are self-explanatory but in short you need to allocate jobs on 'hansen' with srun and not salloc.

Motivation and Context

If you use salloc, SLURM runs the jobs on the login node 'hansen01' but thinks it is running it on a compute node. If you use srun, it runs the job correctly on a compute node. Crazy.

This is what was likely causing all of the test timeouts and strange behavior that we have been seeing lately on 'hansen' documented in issues #2925, #2919 and #2913 (closed).

How Has This Been Tested?

On 'hansen' I ran:

$ time env \
     JOB_NAME=Trilinos-atdm-hansen-shiller-gnu-debug-serial \
     WORKSPACE=$PWD \
     Trilinos_PACKAGES=Kokkos,Teuchos \
     CTEST_TEST_TYPE=Experimental \
     CTEST_DO_SUBMIT=OFF \
     CTEST_DO_UPDATES=OFF \
     CTEST_START_WITH_EMPTY_BINARY_DIRECTORY=FALSE \
   ~/Trilinos.base/Trilinos/cmake/ctest/drivers/atdm/smart-jenkins-driver.sh \
     &> console.out

real    66m37.153s
user    0m0.102s
sys     0m0.114s

The console.out file showed:

+ /usr/bin/srun -N 1 --constraint=k80 -J Trilinos-atdm-hansen-shiller-gnu-debug-serial --time=180 /ascldap/users/rabartl/Trilinos.base/BUILD/HANSEN/JENKINS_DRIVER/Trilinos/cmake/ctest/drivers/atdm/ctest-s-driver.sh
srun: job 1003128 queued and waiting for resources
srun: job 1003128 has been allocated resources

Starting processing .bash_profile


Skipping load of the dev env because /ascldap/users/rabartl/load_dev_env.sh does not exist!


Ending processing .bash_profile


Start: ctest-s-driver.sh

  ==> Wed Jun 13 18:21:21 MDT 2018

Loading env and running ctest -S comamnd to configure, build, and test ...
Current dir: /ascldap/users/rabartl/Trilinos.base/BUILD/HANSEN/JENKINS_DRIVER/SRC_AND_BUILD
Hostname 'hansen04' matches known ATDM host 'hansen' and system 'shiller'
ATDM_CONFIG_TRILNOS_DIR = /home/rabartl/Trilinos.base/Trilinos
Setting default compiler and build options for JOB_NAME='Trilinos-atdm-hansen-shiller-gnu-debug-serial'
Using hansen/shiller compiler stack GNU to build DEBUG code with Kokkos node type SERIAL

See, it showed:

Hostname 'hansen04' matches known ATDM host 'hansen' and system 'shiller'

That shows that it actually running on the compute node.

I also verified looking in the file /SRC_AND_BUILD/BUILD/Testing/Temporary/LastConfigure_20180614-0021.log that it shows:

-- Trilinos_HOSTNAME='hansen04'

Merge request reports