Significant frequent CDash submit failures to testing-vm.sandia.gov/cdash Trilinos project starting mid 11/2018
Created by: bartlettroscoe
@trilinos/framework, @fryeguy52
Next Action Status
After addressing jenkins-srn.sandia.gov builds crashing by moving start times (see TRIL-237), we finally saw a day 12/19/2018 with full build and test results on CDash. Still are seeing some problems here and there but they are being addressed in other issues (see https://gitlab.kitware.com/snl/project-1/issues/79 and https://sems-atlassian-son.sandia.gov/jira/browse/TRIL-237).
Description
Starting around 11/18/2018 or so, we started to see significant numbers of CDash submit failures. This has resulted in up to 9 out of 30 promoted ATDM Trilinos builds not submitting test results as shown, for example, on CDash on 11/25/2018. You can see that these seem to be CDash submit failures from looking at the Jenkins output. And we see not just failures to submit test results (but that is the most common submit failure) but we also see failures to submit 'update' and 'configure' results in:
showing:
00:25:14 Submit files (using http)
00:25:14 Send to track: ATDM
00:25:14 Using HTTP submit method
00:25:14 Drop site:http://testing-vm.sandia.gov/cdash/submit.php?project=Trilinos
00:27:21 Submit failed, waiting 3 seconds...
00:27:24 Retry submission: Attempt 1 of 5
00:28:06 Submission failed: Checksum failed for file. Expected 9cc616d7c7444ddd408b2aa89977a0b8 but got 636f51bd65019e2d3d39d4fdb012ef00.
00:28:06 Submit failed, waiting 3 seconds...
00:28:09 Retry submission: Attempt 2 of 5
00:28:44 Submission failed: Checksum failed for file. Expected 9cc616d7c7444ddd408b2aa89977a0b8 but got d41d8cd98f00b204e9800998ecf8427e.
00:28:44 Submit failed, waiting 3 seconds...
00:28:47 Retry submission: Attempt 3 of 5
00:29:34 Submission failed: Checksum failed for file. Expected 9cc616d7c7444ddd408b2aa89977a0b8 but got 64fec38d5fd635493fd311dc65ff631c.
00:29:34 Submit failed, waiting 3 seconds...
00:29:37 Retry submission: Attempt 4 of 5
00:30:04 Submission failed: Checksum failed for file. Expected 9cc616d7c7444ddd408b2aa89977a0b8 but got d41d8cd98f00b204e9800998ecf8427e.
00:30:04 Submit failed, waiting 3 seconds...
00:30:07 Retry submission: Attempt 5 of 5
00:32:14 Error when uploading file: /home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-8.0-opt/SRC_AND_BUILD/BUILD/Testing/20181125-0400/Configure.xml
00:32:14 Error message was: Failed to connect to testing-vm.sandia.gov port 80: Connection timed out
00:32:14 Problems when submitting via HTTP
and failures to submit 'build' results in:
showing:
11:50:15 Submit files (using http)
11:50:15 Send to track: ATDM
11:50:15 Using HTTP submit method
11:50:15 Drop site:http://testing-vm.sandia.gov/cdash/submit.php?project=Trilinos
11:50:51 Submission failed: Checksum failed for file. Expected b6b1936e73c4fcda5459c4c3305718ad but got f33c85574fc1714cb55e743d00c37f63.
11:50:51 Submit failed, waiting 3 seconds...
11:50:54 Retry submission: Attempt 1 of 5
11:53:01 Submit failed, waiting 3 seconds...
11:53:04 Retry submission: Attempt 2 of 5
11:55:11 Submit failed, waiting 3 seconds...
11:55:14 Retry submission: Attempt 3 of 5
11:57:22 Submit failed, waiting 3 seconds...
11:57:25 Retry submission: Attempt 4 of 5
11:57:45 Submission failed: Checksum failed for file. Expected b6b1936e73c4fcda5459c4c3305718ad but got d41d8cd98f00b204e9800998ecf8427e.
11:57:45 Submit failed, waiting 3 seconds...
11:57:48 Retry submission: Attempt 5 of 5
11:58:35 Submission failed: Checksum failed for file. Expected b6b1936e73c4fcda5459c4c3305718ad but got 4822ffa741a1777a465e8cfdd7764c97.
11:58:35 Uploaded: /home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-cuda-9.0-debug/SRC_AND_BUILD/BUILD/Testing/20181125-0400/Build.xml
11:58:35 Errors occurred during submission.
and of course failures to submit 'test' results in:
showing:
Submit files (using http)
Send to track: ATDM
Using HTTP submit method
Drop site:http://testing-vm.sandia.gov/cdash/submit.php?project=Trilinos
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got f898f5520369486d65c0a4e5bf510f8b.
Submit failed, waiting 3 seconds...
Retry submission: Attempt 1 of 5
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got f6d6ed6443bb76d6f2ecdf8c9103c065.
Submit failed, waiting 3 seconds...
Retry submission: Attempt 2 of 5
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got bf51a2e520d51fc30a9a06459bbf88e9.
Submit failed, waiting 3 seconds...
Retry submission: Attempt 3 of 5
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got c8f2a8d360b11a165ebd854dd3676604.
Submit failed, waiting 3 seconds...
Retry submission: Attempt 4 of 5
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got a1292fc160823ef0142d7f01a960dd47.
Submit failed, waiting 3 seconds...
Retry submission: Attempt 5 of 5
Submission failed: Checksum failed for file. Expected 4a3869a85b2dacb9947d040b2c22673d but got 94252fe24f52c3dafae1f00fe671ad4c.
Uploaded: /home/jenkins/hansen/workspace/Trilinos-atdm-hansen-shiller-gnu-debug-openmp/SRC_AND_BUILD/BUILD/Testing/20181125-0400/Test.xml
Errors occurred during submission.
But this is not only impacting the ATDM Trilinos builds. For example, if you look at the submits of the "Clean" builds since 11/01/2018 shown here, you can see test results missing as far back as 11/16/2018 and more recently we see significant numbers of missing test results on 11/22/2108, 11/25/2018 and today on 11/26/2018.