The present work reports on a data-parallel molecular dynamics algorithm using Verlet neighbor-lists for the local force computations, which achieves 5.1 GFLOPS on a 128-node CM-5E. The force computation is as efficient as any serial implementation would be, except for a 20% reduction due to variations in neighbor-list length. The parallel code does local indirect memory addressing less than optimally due to compiler shortcomings, so truly optimum efficiency would require an improved compiler or a recoding in CDPEAC assembler for the CM-5. For the case of neighbor-list construction we used the latter approach. The construction of neighbor-lists takes about 1.6 times the force computation, but it only needs to be carried out after several MD timesteps.
The final load-balancing issue is related to distributing the system in a uniform grid, but for the case of fairly homogeneous solid systems such as the metals we're studying this issue is not significant.
The parallel Verlet neighbor-list code is about 5 times faster than our previous parallel code, which did not emply neighbor-lists.
In conclusion, we only need the CM-Fortran compiler to be able to generate efficient code for local indirect-memory addressing on the CM-5E vector-units in order for our data-parallel CM-Fortran code to be optimally efficient. If this were achieved, the code's performance should be as high as that of any serial neighbor-list algorithm, except for the minor load-imbalances discussed above.
Our CM-Fortran code is not restricted to Connection Machines, but should be portable with a limited effort to High-Performance Fortran (HPF) compilers that support the FORALL statement.