Summer 2000 transverse lattice project:
Diagonalization of sparse symmetric matrices on parallel computers

The transverse lattice is a non-perturbative approach for solving the bound state problem for the theory of the strong force, Quantum Chromodynamics. The ultimate goal of this work is to calculate various properties of the hadrons, protons, neutrons, and the like, which are of great experimental interest but are difficult or impossible to calculate using other approaches. More detail can be found in a Postscript file.

This project focuses on some more computational aspects of the transverse lattice. From a computational point of view, we must calculate the lowest eigenvalues and eigenvectors of very large matrices. We are developing computer code that can do these computations efficiently on a large parallel computer. Our application for computing resources at PSC describes the status in 1999.

Some relevant topics

Diagonalizing sparse symmetric matrices with PLANSO

We have currently got the PLANDR2() function of the PLANSO package(Kesheng Wu and Horst Simon) to run on both a CRAY T3E and on a cluster of PC's.

The T3E machine is a Cray T3E-900 located at PSC Pittsburgh with 512 application PE's which are 450Mhz Alpha processors each with 128MB of RAM. The PC cluster, which are Intel based PC's, consists of 2 dual processor 400Mhz PentiumII machines and one single processor 400Mhz PentiumII machine. All three machines have 256MB of RAM.

Test runs for large matrices

PC cluster (to top)

On the PC cluster, using the 5 available PE's, we have successfully diagonalized a sparse symmetric matrix with an order of 282,274 and having 13,194,799 non-zero elements.

Info we care about:

Average CPU usage60-80%
Memory usageapprox 100MB
Approx avg diagonalizing time/PE209 seconds
Lanczos steps taken124 steps
# of Ritz values found5 values
Error boundless than +/- 1.0E-09

Also on the PC cluster we have diagonalized a slightly smaller matrix with an order of 206,622, but containing more non-zero elements at 15,474,058.

Info we care about

Average CPU usage60-80%
Memory usageapprox 97MB
Approx avg diagonalizing time/PE189 seconds
Lanczos steps taken134 steps
# of Ritz values found5 values
Error boundless than +/- 1.0E-09

T3E (to top)

On the Cray T3E at Pittsburgh, we have so far successfully diagonalized a matrix with the order of 1,037,921, which contained 57,694,268 non-zero elements done on half of the available, or 256, PE's.
Info we care about:

Average CPU usageunknown
Memory usageapprox 85MB
Approx avg diagonalizing time/PE132 seconds
Lanczos steps taken127 steps
# of Ritz values found5 values
Error boundless than +/- 1.0E-09

Comparing DNLASO and PLANDR2

Our first eigen solver package for large matrices was LASO and we used the DNLASO function of that package. This package was a uniprocessor package and so we compared it with the PLANSO package run on a single PE. Our test was on a matrix with the order of 110,488 and containing 7,313,136 non-zero elements which was the largest we could diagonalize with both this particular package and the PLANSO package using only a single PE. We also compared the previous results with multi PE runs of PLANSO on both the PC cluster and T3E, using 5 PE's on both, and using the same test matrix.

A comparison on the PC cluster

DNLASO(1 PE)PLANDR2 (1 PC cluster PE)PLANDR2 (5 PC cluster PE's)
diag time/PE569sec.158sec.86sec.
% speedup
from previous
----350%180%
CPU usage97-99%97-99%60-80%
Mem usage for
Lanczos vectors/PE
5MB60MB19MB
Lanczos steps162133122

A comparison between PC cluster and the T3E

PLANDR2 (5 PC cluster PE's)PLANDR2 (5 T3E PE's)
diag time86sec.37sec.
% speedup
from previous
180% (from 1 PC cluster PE)230%
CPU usage60-80%unknown
Mem usage for
Lanczos vectors/PE
19MBapprox. 20MB
Lanczos steps122131

A comparison with the T3E multi PE and the PC cluster multi PE shows the lack in PE communication efficiency that the PC cluster has. CPU usage was not able to be measured, however, it is easy to deduce from the diagonalization times that the T3E has a higher CPU usage than our local PC cluster due mostly to slower communication between processors.

Note that the given function in the file "store.f," store, was slightly modified when creating the lanso1 and plan libraries. The array A was changed from having 500,000 elements to having 5,000,000 elements to accommodate the large lanczos vector sizes.

(to top)

Student project summary

The purpose of this internship was to modify Dr. van de Sande's code that it diagonalizes certain Hamiltonian matrices in a truly parallel manner. His code used the DNLASO subroutine of the LANSO package as well as some LAPACK routines for smaller matrices. The code was parallel in that a "master" process controlled other "slave" processes that worked on a separate problem then the master gathered the answers. Our goal was to modify the code so that all processes would work on the same problem and then leave the answer with the master process.

The summer internship started with learning the OS Linux and other fundamentals of what the research would deal with throughout the summer. The next task tackled was the diagonalizing of a test matrix using LAPACK routines which helped to learn the basics of the GNU compilers. Next, the LANSO package with the DNLASO subroutine was used to diagonalize the same test matrix. Then a matrix vector multiply was created that would use Dr. van de Sande's format for storing the Hamiltonian sparse symmetric matrix. This was analogous to his current progress of the diagonalizing code. Dr. van de Sande's and the current code were then benchmarked against each other.

The next step was researching the Cray T3E at PSC and it's environment for this is where the code would eventually be fully tested and ran. After the Cray had been researched, research began on packages to diagonalize the Hamiltonians completely in parallel. This research resulted in finding the PLANSO package. After extensive testing and debugging the code was run on both a PC cluster and on the Cray. The benchmarks are listed above. (To Benchmarking). During the final debugging and testing another similar pack TRLan was discovered and also researched. It was said to have produced the same results as PLANSO but was unable to be completely researched due to lack of a Fortran 90 compiler. After the benchmarking and final testing the PLANSO code was integrated with Dr. van de Sande's code and some final modifications of the current code, his code, and the PLANSO package were performed. This is where the internship left off.

Brett van de Sande,
Homepage: http://www.geneva.edu/~bvds and
E-mail: bvds@pitt.edu