Stratimikos: Amesos2 SuperLU_DIST test failing
Created by: ibaned
This is an unusual issue in that the relevant code is not yet in Trilinos at the time of posting, but it will help us to track problems with PR #1090 . If one checks out that code and compiles with Thyra, Tpetra, Amesos2, Stratimikos, KLU2, and SuperLUDist enabled, the following command:
ctest -VV -R Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1
Produces the following output:
UpdateCTestConfiguration from :/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Parse Config file:/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Add coverage exclude regular expressions.
SetCTestConfiguration:CMakeCommand:/usr/local/bin/cmake
UpdateCTestConfiguration from :/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Parse Config file:/home/daibane/build/host/Trilinos/DartConfiguration.tcl
Test project /home/daibane/build/host/Trilinos
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 13
Start 13: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1
13: Test command: /home/daibane/install/host/mpich/bin/mpiexec "-np" "1" "/home/daibane/build/host/Trilinos/packages/stratimikos/adapters/amesos2/test/Stratimikos_test_single_amesos2_tpetra_solver_driver.exe" "--show-all-tests" "--solver-type=SuperLU_DIST" "--verbose" "--matrix-file=A.mm"
13: Test timeout computed to be: 1500
13: Teuchos::GlobalMPISession::GlobalMPISession(): started processor with name westley.srn.sandia.gov and rank 0!
13:
13: ***
13: *** Testing Thyra::BelosLinearOpWithSolveFactory (and Thyra::BelosLinearOpWithSolve)
13: ***
13:
13: Echoing input options:
13: matrixFile = A.mm
13: numRhs = 1
13: numRandomVectors = 1
13: maxFwdError = 1e-14
13: maxResid = 1e-06
13: showAllTests = 1
13: dumpAll = 0
13:
13: A) Reading in a tpetra matrix A from the file 'A.mm' ...
13:
13: B) Creating an Amesos2LinearOpWithSolveFactory object opFactory ...
13:
13: lowsFactory.getValidParameters():
13: Solver Type : string = KLU2
13: Refactorization Policy : string = RepivotOnRefactorization
13: Throw on Preconditioner Input : bool = 1
13: VerboseObject ->
13: Verbosity Level : string = default
13: Output File : string = none
13:
13: amesos2LOWSFPL before setting parameters:
13: Solver Type : string = SuperLU_DIST [unused]
13:
13: amesos2LOWSFPL after setting parameters:
13: Solver Type : string = SuperLU_DIST
13: Refactorization Policy : string = RepivotOnRefactorization [default]
13: Throw on Preconditioner Input : bool = 1 [default]
13: VerboseObject ->
13: Output File : string = none [default]
13: Verbosity Level : string = default [default]
13:
13: C) Creating a Amesos2LinearOpWithSolve object nsA from A ...
13: .. Use parMETIS ordering on A'+A with 1 sub-domains.
13: Max szBlk 128
13: Parameters: fill mem 5 fill pelt 5
13: Nonzeros in L 29971
13: Nonzeros in U 19971
13: nonzeros in L+U-I 49942
13: No of supers 9990
13: Size of G(L) 29952
13: Size of G(U) 19962
13: Size of G(L+U) 49914
13: ParSYMBfact (MB) : L\U MAX 0.68 AVG 0.68
13: .. # L blocks 29933 # U blocks 19943
13: MPI tag upper bound = 268435455
13: .. Starting with 1 OpenMP threads
13: === using DAG ===
13: * init: 3.021002e-03 seconds
13: .. thresh = s_eps 5.960464e-08 * anorm 3.999800e+04 = 2.384067e-03
13: .. Buffer size: Lsub 11 Lval 9 Usub 11 Uval 2 LDA 3
13: [0] .. BIG U size 3072
13: [0] .. BIG V size 131072
13: Max row size is 3
13: Using buffer_size of 5000000
13: Threads per process 1
13: Time in scattering 0.000000
13: Time in dgemm 0.000000
13: Total time spent in schur update is : 0.01 seconds,
13: Total Time in Factorization : 0.02 seconds,
13: Time (other GEMM and Scatter) : 0.02 seconds,
13: Total time spent in schur update when offload : 0.00 seconds,
13:
13: D) Testing the LinearOpBase interface of nsA ...
13:
13: *** Entering LinearOpTester<double,double>::check(op,...) ...
13:
13: describe op:
13: Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
13: fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
13: amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
13:
13: Checking the domain and range spaces ...
13: op.domain().get() != NULL ? passed
13:
13: op.range().get() != NULL ? passed
13:
13: this->check_linear_properties()==true:Checking the linear properties of the forward linear operator ... op.opSupported(NOTRANS) = true == true : passed
13:
13: Checking that the forward operator is truly linear:
13:
13: 0.5*op*(v1 + v2) == 0.5*op*v1 + 0.5*op*v2
13: \_____/ \___/
13: v3 v5
13: \_____________/ \___________________/
13: v4 v5
13:
13: sum(v4) == sum(v5)
13:
13: Random vector tests = 1
13:
13: v1 = randomize(-1,+1); ...
13:
13: v2 = randomize(-1,+1); ...
13:
13: v3 = v1 + v2 ...
13:
13: v4 = 0.5*op*v3 ...
13:
13: v5 = op*v1 ...
13:
13: v5 = 0.5*op*v2 + 0.5*v5 ...
13:
13: Check: rel_err(sum(v4), sum(v5))
13: = rel_err(-0.37757, -0.37757) = 3.23449e-15
13: <= linear_properties_error_tol() = 1e-14 : passed
13: Warning! rel_err(sum(v4), sum(v5))
13: = rel_err(-0.37757, -0.37757) = 3.23449e-15
13: >= linear_properties_warning_tol() = 1e-16!
13:
13: (this->check_linear_properties()&&this->check_adjoint())==false: Skipping the check of the linear properties of the adjoint operator!
13:
13: this->check_adjoint()==false: Skipping check for the agreement of the adjoint and forward operators!
13:
13: this->check_for_symmetry()==false: Skipping check of symmetry ...
13:
13: Congratulations, this LinearOpBase object seems to check out!
13:
13: *** Leaving LinearOpTester<double,double>::check(...)
13:
13: E) Testing the LinearOpWithSolveBase interface of nsA ...
13:
13: *** Entering LinearOpWithSolveTester<double>::check(op,...) ...
13:
13: describe forward op:
13: Thyra::Amesos2LinearOpWithSolve<double>{rangeDim=10000,domainDim=10000}
13: fwdOp = Thyra::TpetraLinearOp<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >{rangeDim=10000,domainDim=10000}
13: amesos2Solver=Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> >
13:
13: this->check_forward_default()==true: Checking the default forward solve ... op.solveSupports(NOTRANS) = true == true : passed
13:
13: Checking that the forward default solve matches the forward operator:
13:
13: inv(Op)*Op*v1 == v1
13: \___/
13: v2
13: \___________/
13: v3
13:
13: v4 = v3-v1
13: v5 = Op*v3-v2
13:
13: norm(v4)/norm(v1) <= forward_default_solution_error_error_tol()
13: norm(v5)/norm(v2) <= forward_default_residual_error_tol()
13:
13: Random vector tests = 1
13:
13: v1 = randomize(-1,+1); ...
13:
13: v2 = Op*v1 ...
13:
13: => Apply time = 8.10623e-05 sec
13:
13: v3 = inv(Op)*v2 ...
13:
13: Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
13:
13:
13: => Solve time = 0.007236 sec
13:
13: solve status:
13: solveStatus = SOLVE_STATUS_CONVERGED
13: achievedTol = unknownTolerance()
13: message:extraParameters: NONE
13:
13: v4 = v3 - v1 ...
13:
13: v5 = Op*v3 - v2 ...
13:
13: => Apply time = 7.10487e-05 sec
13:
13: Check: |norm(v4)/norm(v1)| = 0.29299 <= forward_default_solution_error_error_tol() = 1e-06 : FAILED
13:
13: Check: |norm(v5)/norm(v2)| = 5.91491e-06 <= forward_default_residual_error_tol() = 2e-06 : FAILED
13:
13: this->check_forward_residual()==true: Checking the forward solve with a tolerance on the residual ... op.solveSupports(NOTRANS) = true == true : passed
13:
13: Checking that the forward solve matches the forward operator to a residual tolerance:
13:
13: v3 = inv(Op)*Op*v1
13: \___/
13: v2
13:
13: v4 = Op*v3-v2
13:
13: norm(v4)/norm(v2) <= forward_residual_solve_tol() + forward_residual_slack_error_tol()
13:
13: Random vector tests = 1
13:
13: v1 = randomize(-1,+1); ...
13:
13: v2 = Op*v1 ...
13:
13: => Apply time = 6.79493e-05 sec
13:
13: v3 = inv(Op)*v2 ...
13:
13: Solving system using Amesos2 solver Amesos2::Superludist<Tpetra::CrsMatrix<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false>, Tpetra::MultiVector<double, int, int, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>, false> > ...
13:
13:
13: => Solve time = 0.0063262 sec
13:
13: solve status:
13: solveStatus = SOLVE_STATUS_CONVERGED
13: achievedTol = unknownTolerance()
13: message:extraParameters: NONE
13:
13: check: solveStatus = SOLVE_STATUS_CONVERGED == SOLVE_STATUS_CONVERGED : passed
13:
13: v4 = Op*v3 - v2 ...
13:
13: => Apply time = 7.00951e-05 sec
13:
13: Check: |norm(v4)/norm(v2)| = 6.72255e-06 <= forward_residual_solve_tol()+forward_residual_slack_error_tol() = 2e-06 : FAILED
13:
13: this->check_adjoint_default()==false: Skipping the check of the adjoint solve with a default tolerance!
13:
13: this->check_adjoint_residual()==false: Skipping the check of the adjoint solve with a tolerance on the residual!
13:
13: Oh no, at least one of the tests performed with this LinearOpWithSolveBase object failed (see above failures)!
13:
13: *** Leaving LinearOpWithSolveTester<double>::check(...)
13:
13: amesos2LOWSFPL after solving:
13: Solver Type : string = SuperLU_DIST
13: Refactorization Policy : string = RepivotOnRefactorization [default]
13: Throw on Preconditioner Input : bool = 1 [default]
13: VerboseObject ->
13: Output File : string = none [default]
13: Verbosity Level : string = default [default]
13:
13: Oh no! At least one of the tests failed!
1/1 Test #13: Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 ...***Failed 0.44 sec
0% tests passed, 1 tests failed out of 1
Label Time Summary:
Stratimikos = 0.44 sec (1 test)
Total Test time (real) = 0.48 sec
The following tests FAILED:
13 - Stratimikos_test_single_amesos2_tpetra_solver_driver_SuperLU_DIST_MPI_1 (Failed)
Errors while running CTest
It looks like the most outstanding issue is that inv(A)*A*v != v
, by a large error (~0.2), in part (E) of the testing.
@srajama1