Summer 2000 transverse lattice project:
Diagonalization of sparse symmetric matrices on parallel computers

The transverse lattice is a non-perturbative approach for solving the bound state problem for the theory of the strong force, Quantum Chromodynamics. The ultimate goal of this work is to calculate various properties of the hadrons, protons, neutrons, and the like, which are of great experimental interest but are difficult or impossible to calculate using other approaches. More detail can be found in a Postscript file.

This project focuses on some more computational aspects of the transverse lattice. From a computational point of view, we must calculate the lowest eigenvalues and eigenvectors of very large matrices. We are developing computer code that can do these computations efficiently on a large parallel computer. Our application for computing resources at PSC describes the status in 1999.

Some relevant topics

Message passing is accomplished using MPI.
- Local machines will run the MPICH implementation of the protocol.
- We might also look into using the LAM implementation.
- Here is a local copy of the MPI 1.1 standard and errata.
We have two computers for local code development. Each machine has:
- Motherboard: Tyan Tiger 100, S1832DL. The new Tiger 133, S1834 was not available on time.
- CPU: dual Intel Pentium II, 400 Mhz, 512 kbyte cache.
- Memory: 256 Mbyte PC100 ECC SDRAM.
- Network: 100 Mbit/s ethernet card.
- Other junk: monitors, disk drives, et cetera.
Large computations are performed at the Pittsburgh Supercomputing Center, (T3E).
Lanczos Methods (Symmetric/Hermitian)
- LANCZOS Cullum and Willoughby's symmetric Lanczos package
- LANZ Jones and Patrick's symmetric Lanczos packpage
- LASO D. Scott's symmetric Lanczos packpage
- LANSO Parlett and Co.'s symmetric Lanczos package
- PLANSO: parallel version of LANSO by Wu and Simon
- TRLAN: Thick-Restart Lanczos by Wu and Simon
- IRLM in ARPACK
- PARPACK: parallel version of ARPACK is designed to solve large sparse eigensystems on a parallel machine. Unfortunately, the Arnoldi method is less efficient than Lanczos for symmetric eigenvalue problems.
- BLZPACK: Block Lanczos not parallel, but does have complex version.
Some packages that handle parallel-sparse matrices:
- pspases
- AZTEC.
- blocksolve95.
- PETSc.
- P_SPARSLIB.
These are mostly linear equation solvers. The ScaLAPACK Project contains several packages of interest, including PARPACK and ScaLAPACK itself. These are discussed in a review by Victor Eijkhout

Diagonalizing sparse symmetric matrices with PLANSO

We have currently got the PLANDR2() function of the PLANSO package(Kesheng Wu and Horst Simon) to run on both a CRAY T3E and on a cluster of PC's.

The T3E machine is a Cray T3E-900 located at PSC Pittsburgh with 512 application PE's which are 450Mhz Alpha processors each with 128MB of RAM. The PC cluster, which are Intel based PC's, consists of 2 dual processor 400Mhz PentiumII machines and one single processor 400Mhz PentiumII machine. All three machines have 256MB of RAM.

Test runs for large matrices

PC cluster (to top)

On the PC cluster, using the 5 available PE's, we have successfully diagonalized a sparse symmetric matrix with an order of 282,274 and having 13,194,799 non-zero elements.

Info we care about:

Average CPU usage	60-80%
Memory usage	approx 100MB
Approx avg diagonalizing time/PE	209 seconds
Lanczos steps taken	124 steps
# of Ritz values found	5 values
Error bound	less than +/- 1.0E-09

Also on the PC cluster we have diagonalized a slightly smaller matrix with an order of 206,622, but containing more non-zero elements at 15,474,058.

Info we care about

Average CPU usage	60-80%
Memory usage	approx 97MB
Approx avg diagonalizing time/PE	189 seconds
Lanczos steps taken	134 steps
# of Ritz values found	5 values
Error bound	less than +/- 1.0E-09

T3E (to top)

On the Cray T3E at Pittsburgh, we have so far successfully diagonalized a matrix with the order of 1,037,921, which contained 57,694,268 non-zero elements done on half of the available, or 256, PE's.
Info we care about:

Average CPU usage	unknown
Memory usage	approx 85MB
Approx avg diagonalizing time/PE	132 seconds
Lanczos steps taken	127 steps
# of Ritz values found	5 values
Error bound	less than +/- 1.0E-09

Comparing `DNLASO` and `PLANDR2`

Our first eigen solver package for large matrices was LASO and we used the DNLASO function of that package. This package was a uniprocessor package and so we compared it with the PLANSO package run on a single PE. Our test was on a matrix with the order of 110,488 and containing 7,313,136 non-zero elements which was the largest we could diagonalize with both this particular package and the PLANSO package using only a single PE. We also compared the previous results with multi PE runs of PLANSO on both the PC cluster and T3E, using 5 PE's on both, and using the same test matrix.

A comparison on the PC cluster

	DNLASO(1 PE)	PLANDR2 (1 PC cluster PE)	PLANDR2 (5 PC cluster PE's)
diag time/PE	569sec.	158sec.	86sec.
% speedup from previous	----	350%	180%
CPU usage	97-99%	97-99%	60-80%
Mem usage for Lanczos vectors/PE	5MB	60MB	19MB
Lanczos steps	162	133	122

A comparison between PC cluster and the T3E

	PLANDR2 (5 PC cluster PE's)	PLANDR2 (5 T3E PE's)
diag time	86sec.	37sec.
% speedup from previous	180% (from 1 PC cluster PE)	230%
CPU usage	60-80%	unknown
Mem usage for Lanczos vectors/PE	19MB	approx. 20MB
Lanczos steps	122	131

A comparison with the T3E multi PE and the PC cluster multi PE shows the lack in PE communication efficiency that the PC cluster has. CPU usage was not able to be measured, however, it is easy to deduce from the diagonalization times that the T3E has a higher CPU usage than our local PC cluster due mostly to slower communication between processors.

Note that the given function in the file "store.f," store, was slightly modified when creating the lanso1 and plan libraries. The array A was changed from having 500,000 elements to having 5,000,000 elements to accommodate the large lanczos vector sizes.

(to top)

Student project summary

The purpose of this internship was to modify Dr. van de Sande's code that it diagonalizes certain Hamiltonian matrices in a truly parallel manner. His code used the DNLASO subroutine of the LANSO package as well as some LAPACK routines for smaller matrices. The code was parallel in that a "master" process controlled other "slave" processes that worked on a separate problem then the master gathered the answers. Our goal was to modify the code so that all processes would work on the same problem and then leave the answer with the master process.

The summer internship started with learning the OS Linux and other fundamentals of what the research would deal with throughout the summer. The next task tackled was the diagonalizing of a test matrix using LAPACK routines which helped to learn the basics of the GNU compilers. Next, the LANSO package with the DNLASO subroutine was used to diagonalize the same test matrix. Then a matrix vector multiply was created that would use Dr. van de Sande's format for storing the Hamiltonian sparse symmetric matrix. This was analogous to his current progress of the diagonalizing code. Dr. van de Sande's and the current code were then benchmarked against each other.

The next step was researching the Cray T3E at PSC and it's environment for this is where the code would eventually be fully tested and ran. After the Cray had been researched, research began on packages to diagonalize the Hamiltonians completely in parallel. This research resulted in finding the PLANSO package. After extensive testing and debugging the code was run on both a PC cluster and on the Cray. The benchmarks are listed above. (To Benchmarking). During the final debugging and testing another similar pack TRLan was discovered and also researched. It was said to have produced the same results as PLANSO but was unable to be completely researched due to lack of a Fortran 90 compiler. After the benchmarking and final testing the PLANSO code was integrated with Dr. van de Sande's code and some final modifications of the current code, his code, and the PLANSO package were performed. This is where the internship left off.

Brett van de Sande,
Homepage: http://www.geneva.edu/~bvds and
E-mail: bvds@pitt.edu