This project focuses on some more computational aspects of the transverse lattice. From a computational point of view, we must calculate the lowest eigenvalues and eigenvectors of very large matrices. We are developing computer code that can do these computations efficiently on a large parallel computer. Our application for computing resources at PSC describes the status in 1999.
PLANDR2()
function of the PLANSO
package(Kesheng Wu and
Horst Simon) to run on
both a CRAY T3E and on a cluster of PC's.
The T3E machine is a Cray T3E-900 located at PSC Pittsburgh with 512 application PE's which are 450Mhz Alpha processors each with 128MB of RAM. The PC cluster, which are Intel based PC's, consists of 2 dual processor 400Mhz PentiumII machines and one single processor 400Mhz PentiumII machine. All three machines have 256MB of RAM.
PC cluster (to top)
On the PC cluster, using the 5 available PE's, we have successfully diagonalized a
sparse symmetric matrix with an order of 282,274 and having 13,194,799
non-zero elements.
Info we care about:
Average CPU usage | 60-80% |
Memory usage | approx 100MB |
Approx avg diagonalizing time/PE | 209 seconds |
Lanczos steps taken | 124 steps |
# of Ritz values found | 5 values |
Error bound | less than +/- 1.0E-09 |
Info we care about
Average CPU usage | 60-80% |
Memory usage | approx 97MB |
Approx avg diagonalizing time/PE | 189 seconds |
Lanczos steps taken | 134 steps |
# of Ritz values found | 5 values |
Error bound | less than +/- 1.0E-09 |
T3E (to top)
On the Cray T3E at Pittsburgh, we have so far successfully diagonalized a
matrix with the order of 1,037,921, which contained 57,694,268 non-zero
elements done on half of the available, or 256, PE's.
Info we care about:
Average CPU usage | unknown |
Memory usage | approx 85MB |
Approx avg diagonalizing time/PE | 132 seconds |
Lanczos steps taken | 127 steps |
# of Ritz values found | 5 values |
Error bound | less than +/- 1.0E-09 |
DNLASO
and PLANDR2
DNLASO
function of that package. This package was a uniprocessor package and
so we compared it with the PLANSO package run on a single PE. Our test was on a
matrix with the order of 110,488 and containing 7,313,136 non-zero elements
which was the largest we could diagonalize with both this particular package
and the PLANSO package using only a single PE. We also compared the previous
results with multi PE runs of PLANSO on both the PC cluster and T3E, using 5 PE's on both,
and using the same test matrix.
A comparison on the PC cluster
DNLASO(1 PE) | PLANDR2 (1 PC cluster PE) | PLANDR2 (5 PC cluster PE's) | |
diag time/PE | 569sec. | 158sec. | 86sec. |
% speedup from previous | ---- | 350% | 180% |
CPU usage | 97-99% | 97-99% | 60-80% |
Mem usage for Lanczos vectors/PE | 5MB | 60MB | 19MB |
Lanczos steps | 162 | 133 | 122 |
A comparison between PC cluster and the T3E
PLANDR2 (5 PC cluster PE's) | PLANDR2 (5 T3E PE's) | |
diag time | 86sec. | 37sec. |
% speedup from previous | 180% (from 1 PC cluster PE) | 230% |
CPU usage | 60-80% | unknown |
Mem usage for Lanczos vectors/PE | 19MB | approx. 20MB |
Lanczos steps | 122 | 131 |
Note that the given function in the file "store.f," store
,
was slightly modified
when creating the lanso1 and plan libraries. The array A
was changed from having
500,000 elements to having 5,000,000 elements to accommodate the large lanczos
vector sizes.
The purpose of this internship was to modify Dr. van de Sande's code that
it diagonalizes certain Hamiltonian matrices in a truly parallel manner. His code
used the DNLASO
subroutine of the LANSO package as well as some LAPACK
routines for smaller matrices. The code was parallel in that a "master" process
controlled other "slave" processes that worked on a separate problem then the master
gathered the answers. Our goal was to modify the code so that all processes would
work on the same problem and then leave the answer with the master process.
The summer internship started with learning the OS Linux and other fundamentals of
what the research would deal with throughout the summer. The next task tackled was
the diagonalizing of a test matrix using LAPACK routines which helped to learn the basics
of the GNU compilers. Next, the LANSO package with the DNLASO
subroutine was used to diagonalize the same test matrix.
Then a matrix vector multiply was created that would use
Dr. van de Sande's format for storing the Hamiltonian sparse symmetric matrix. This was
analogous to his current progress of the diagonalizing code. Dr. van de Sande's and the
current code were then benchmarked against each other.
The next step was researching the Cray T3E at PSC and it's environment for this is where the code would eventually be fully tested and ran. After the Cray had been researched, research began on packages to diagonalize the Hamiltonians completely in parallel. This research resulted in finding the PLANSO package. After extensive testing and debugging the code was run on both a PC cluster and on the Cray. The benchmarks are listed above. (To Benchmarking). During the final debugging and testing another similar pack TRLan was discovered and also researched. It was said to have produced the same results as PLANSO but was unable to be completely researched due to lack of a Fortran 90 compiler. After the benchmarking and final testing the PLANSO code was integrated with Dr. van de Sande's code and some final modifications of the current code, his code, and the PLANSO package were performed. This is where the internship left off.
Brett van de Sande,