Basic Linear Algebra Libraries for High Performance Computing

Authors

  • Иван [Ivan] Сергеевич [S.] Кружилов [Kruzhilov]
  • Михаил [Mikhail] Борисович [B.] Кузьминский [Kuzminsky]
  • Андрей [Andrey] Михайлович [M.] Чернецов [Chernetsov]
  • Ольга [Olga] Юрьевна [Yu.] Шамаева [Shamayeva]

DOI:

https://doi.org/10.24160/1993-6982-2018-6-87-95

Keywords:

linear algebra libraries, high-performance computing, BLAS, ScaLapack, MAGMA, MKL, ELPA, PLASMA

Abstract

The article considers linear algebra libraries such as BLAS, LAPACK, ScaLAPACK, MKL, and ATLAS, which support high-performance computing (HPC) in modern architectures and are used both in well-known performance tests and in various applications. In the majority of applications, the most time-consuming computation stages are implemented by calling subroutines from such libraries; therefore, the optimal choice of a library is an important issue in setting up computations. The main aim of this review is to describe the "invariant" characteristics of libraries to achieve high performance of applications. High performance computations used in different fields of knowledge are briefly reviewed. Classification of linear algebra libraries in terms of their functionality and applied high-performance architectures is suggested. The basic low-level BLAS library implemented for all HPC architectures is demonstrated. It is pointed out that the BLAS library supports dividing of the entire computation process into several parallel flows in systems with a common memory field; for such systems, tools such as OpenMP or OpenACC are used. In the case of systems with distributed memory, the parallel version of this library, called PBLAS is used, which supports exchange of messages between nodes using the MPI standard. Higher-level libraries based on the BLAS, e.g., the LAPACK library, which contains a large set of different programs for linear algebra, are described. The ScaLAPACK library for the distributed memory model, which is based on the LAPACK and PBLAS libraries, as well as the Intel MKL library, which is its modern development, are presented. To support efficient operation of hybrid systems, the fundamentally new libraries MAGMA and PLASMA involving features for optimizing linear-algebraic computing of small dimension are analyzed. Libraries supporting solution of eigenvalue problems, such as the EISPACK, PeigS, and a number of other libraries, are investigated. It is pointed out that in the new ELPA library oriented to supercomputers, both OpenMP and MPI tools can be used. It is noted that operations on sparse matrices, especially multiplication of matrices, are very relevant for many applied fields of science; in this regard, the SparseBLAS library can be considered to be the basic standard for them. It is concluded that the optimal choice of a library depends essentially on both the particular application and on the used computing architecture.

Author Biographies

Иван [Ivan] Сергеевич [S.] Кружилов [Kruzhilov]

Science degree:

Ph.D. (Techn.)

Workplace

Applied Mathematics Dept., NRU MPEI

Occupation

Assistant Professor

Михаил [Mikhail] Борисович [B.] Кузьминский [Kuzminsky]

Science degree:

Ph.D. (Chem.)

Workplace

N.D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences

Occupation

Senior Researcher

Андрей [Andrey] Михайлович [M.] Чернецов [Chernetsov]

Science degree:

Ph.D. (Techn.)

Workplace

Applied Mathematics Dept., NRU MPEI; Dorodnicyn Computing Centre FRC CSC RAS

Occupation

Assistant Professor; Research Assistant

Ольга [Olga] Юрьевна [Yu.] Шамаева [Shamayeva]

Science degree:

Ph.D. (Techn.)

Workplace

Applied Mathematics Dept., NRU MPEI

Occupation

Assistant Professor

References

1. Hopkins B.W. Chemistry in Parallel Computing [Электрон. ресурс] http://studylib.net/doc/9480097/parallel-computing-in-chemistry (дата обращения 17.10.2017).

2. Annual Rep. Swiss National Supercomputing Centre (CSCS) [Офиц. сайт] http://www.cscs.ch/uploads/ tx_factsheet/Annual_Report_2014.pdf (дата обращения 16.10.2017).

3. High Research Computing Center in Moscow State University [Офиц. сайт] http://hpc.msu.ru/?q=apps (дата обращения 17.09.2017).

4. Luszczek P., Kurzak J., Dongarra J. Looking Back at Dense Linear Algebra Software // J. Parallel and Distributed Comp. 2014. V. 74. No. 7. Pp. 2548—2560.

5. Список TOP500 [Офиц. сайт] http://www.top500. org (дата обращения 10.10.2017).

6. Basic Linear Algebra Subprograms BLAS [Офиц. сайт] http://www.netlib.org/blas/ (дата обращения 10.10.2017).

7. Список TOP500 [Офиц. сайт] http://www.top500. org (дата обращения 10.10.2017).

8. Scalable Linear Algebra PACKage SCALAPACK [Офиц. сайт] http://www.netlib.org/scalapack/ (дата обращения 10.10.2017).

9. Математическая библиотека MKL [Офиц. сайт] https://software.intel.com/en-us/intel-mkl (дата обращения 10.10.2017).

10. Automatically Tuned Linear Algebra Software а ATLAS [Офиц. сайт] http://math-atlas.sourceforge.net (дата обращения 10.10.2017).

11.OpenMP [Офиц. сайт] http://openmp.org (дата обращения 15.10.2017).

12. MPI Forum [Офиц. сайт] http://www.mpi-forum. org/docs/ (дата обращения 15.10.2017).

13. Niklasson A.M.N. Density Matrix Methods in Linear Scaling Electronic Structure Theory // LinearScaling Techniques in Computational Chemistry and Phys. Springer Netherlands, 2011. Pp. 439—473.

14.OpenACC [Офиц. сайт] https://www.openacc. org/ (дата обращения 15.10.2017).

15. Описание стандарта OpenCL[Офиц. сайт] https:// www.khronos.org/opencl/ (дата обращения 15.10.2017).

16. Dongarra J. e. a. Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems // Supercomputing Frontiers and Innovations. 2016. V. 2. No. 4. Pp. 67—86.

17. Описание библиотеки GlobalArrays [Офиц. сайт] http://hpc.pnl.gov/globalarrays/ (дата обращения 15.10.2017).

18. PGAS [Офиц. сайт] http://www.pgas.org/ (дата обращения 24.10.2017).

19. Krishnan M., Nieplocha J. SRUMMA: a Matrix Multiplication Algorithm Suitable for Clusters and Scalable-Shared Memory Systems // Proc. 18 Intern. Symp. Parallel and Distributed Proc. 2004. P. 70.

20. YarKhan A. e. a. Porting the PLASMA Numerical Library to the OpenMP Standard // Intern. J. Parallel Programming. 2017. V. 45. No. 3. Pp. 612—633.

21. AMD Developer Central [Офиц. сайт] http:// developer.amd.com/tools-and-sdks/archive/acml-archivedownloads/ (дата обращения 10.10.2017).

22. High-Performance BLAS by Kazushige Goto [Офиц. сайт] http://www.cs.utexas.edu/users/flame/goto/ signup_first.html (дата обращения 10.10.2017).

23. GOTOBLAS2 [Офиц. сайт] https://www.tacc. utexas.edu/research-development/tacc-software/gotoblas2 (дата обращения 10.10.2017).

24. Документация математической библиотеки OpenBLAS [Офиц. сайт] https://github.com/xianyi/ OpenBLAS/wiki (дата обращения 10.10.2017).

25. OpenBLAS [Офиц. сайт] http://www.openblas. net/ (дата обращения 10.10.2017).

26. Naveen G.V. Accelerating Deep Learning and Machine Learning to a New Level // CMG — Computer Measurement Group India [Электрон. ресурс] http://www. cmgindia.org/wp-content/uploads/2016/12/Intel_CMG_ India_Keynote_2016.pdf (дата обращения 10.10.2017).

27. Intel Math Kernel Library [Электрон. ресурс] https://software.intel.com/sites/default/files/managed/ e0/9d/mkl-11.3.2-developer-reference-fortran_0.pdf (дата обращения 10.10.2017).

28. Quintero D. e. a. High-Performance Computing Guide // IBM Redbooks [Электрон. ресурс] http://www. redbooks.ibm.com/redbooks/pdfs/sg10371.pdf (дата обращения 01.09.2017).

29. cuBLAS [Электрон. ресурс] http://docs.nvidia. com/cuda/cublas/index.html (дата обращения 10.10.2017).

30. Kurzak J. e. a. Designing SLATE: Software for Linear Algebra Targeting Exascale // Innovative Computing Laboratory [Электрон. ресурс] http://www.icl. utk.edu/files/publications/2017/icl-utk-980-2017.pdf (дата обращения 15.10.2017).

31. Воеводин В.В., Воеводин В.В. Параллельные вычисления. СПб.: БХВ-Петербург, 2002.

32. MAGMA [Электрон. ресурс] https://developer. nvidia.com/magma (дата обращения 10.10.2017).

33.PLASMA [Электрон. ресурс] http://icl.cs.utk.edu/ plasma/ (дата обращения 10.10.2017).

34. Agullo E. e. a. Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA projects // J. Physics. 2009. V. 180. No. 1. P. 012037.

35. MAGMA [Электрон. ресурс] http://icl.cs.utk.edu/ magma/ (дата обращения 10.10.2017).

36. Intel Developer Zone [Офиц. сайт] https:// software.intel.com/sites (дата обращения 10.10.2017).

37. Dongarra J. e. a. Accelerating Numerical Dense Linear Algebra Calculations with GPUs // Numerical Computations with GPUs. Springer Intern. Publ., 2014. Pp. 3—28.

38. CLSPARSE [Электрон. ресурс] http://gpuopen. com/compute-product/clsparse/ (дата обращения 10.10.2017).

39. Перечень математических библиотек [Электрон. ресурс] http://www.nvidia.ru/object/teslagpu-accelerated-libraries-ru.html (дата обращения 10.10.2017).

40. EISPACK [Электрон. ресурс] http://www.netlib. org/eispack/ (дата обращения 15.10.2017).

41.PEIGS [Электрон. ресурс] http://hpc.pnl.gov/ globalarrays/peigs.shtml (дата обращения 15.10.2017).

42. Global Arrays [Электрон. ресурс] http://hpc.pnl. gov/globalarrays/ (дата обращения 23.10.2017).

43. Kendall R. A. e. a. High Performance Computational Chemistry: an Overview of NWChem a Distributed Parallel Application // Computer Phys. Communications. 2000. V. 128. No. 1—2. Pp. 260—283.

44. Sunderland A.G., Breitmoser E.Y. An Overview of Eigensolvers for HPCx // Complexity. 2003. V. 2. Pp. 3—12.

45. Hammarling S. New Developments in LAPACK and ScaLAPACK [Электрон. ресурс] http://www.maths. bath.ac.uk/~masrs/ma50177/pdfs/bath-25apr07.pdf (дата обращения 15.10.2017).

46. Sparse Matrix Storage Formats for Sparse BLAS Routines [Электрон. ресурс] https://software.intel.com/enus/mkl-developer-reference-c-sparse-matrix-storage-formatsfor-sparse-blas-routines (дата обращения 15.10.2017).

47. Introduction to the Intel MKL Extended Eigensolver [Электрон. ресурс] https://software.intel.com/ en-us/articles/introduction-to-the-intel-mkl-extendedeigensolver (дата обращения 15.10.2017).

48. ELPA [Офиц. сайт] https://elpa.mpcdf.mpg.de/ (дата обращения 15.10.2017).

49. Marek A. The ELPA Library — Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science // J. Physics. 2014. V. 26. P. 213201.

50. Bock N., Challacombe M. An Optimized Sparse Approximate Matrix Multiply // CoRR. 2012. Pp. 1—16.

51. Bock N., Challacombe M. An Optimized Sparse Approximate Matrix Multiply for Matrices With Decay // SIAM J. Scientific Comp. 2013. V. 35. No. 1. Pp. 72—98.

52. Sparse BLAS [Электрон. ресурс] http://www. netlib.org/sparse-blas/ (дата обращения 10.10.2017).

53. Borstnik U. e. a. Sparse Matrix Multiplication: the Distributed Block-compressed Sparse Row Library // Parallel Comp. 2014. V. 40. No. 5. Pp. 47—58.

54. Greathouse J.L. e. a. clSPARSE: a Vendoroptimized Open-source Sparse BLAS Library // Proc. IV Intern. Workshop on OpenCL. 2016. P. 7.

55. Piccolo A., Soodla J. Performance of Parallel Sparse Matrix-matrixmultiplication // SCRIBD [Электрон. ресурс] http://www.diva-portal.org/smash/get/diva2:821031/ FULLTEXT01.pdf (дата обращения 15.10.2017).

56. Rubensson E.H., Rudberg E. Locality-aware Parallel Block-sparse Matrix-matrix Multiplication Using the Chunks and Tasks Programming Model // Parallel Computing. Elsevier, 2016.

57. PARPACK [Электрон. ресурс] www.caam.rice. edu/~kristyn/parpack_home.html (дата обращения 15.10.2017).

58. Heinecke A. e. a. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation // Proc. High Performance Computing, Networking, Storage and Analysi Intern. Conf. 2016. Pp. 981—991.

59. Zhao Z. e. a. Performance of Hybrid MPI/OpenMP VASP on Cray XC40 Based on Intel Knights Landing Many Integrated Core Architecture [Электрон. ресурс] https:// cug.org/proceedings/cug2017_proceedings/includes/files/ pap134s2-file1.pdf (дата обращения 10.10.2017).
---
Для цитирования: Кружилов И.С., Кузьминский М.Б., Чернецов А.М., Шамаева О.Ю. Базовые библиотеки линейной алгебры для высокопроизводительных расчетов // Вестник МЭИ. 2018. № 6. С. 87—95. DOI: 10.24160/1993-6982-2018-6-87-95.
#
1. Hopkins B.W. Chemistry in Parallel Computing [Elektron. Resurs] http://studylib.net/doc/9480097/parallelcomputing-in-chemistry (Data Obrashcheniya 17.10.2017).

2. Annual Rep. Swiss National Supercomputing Centre (CSCS) [Ofits. Sayt] http://www.cscs.ch/uploads/tx_ factsheet/Annual_Report_2014.pdf (Data Obrashcheniya16.10.2017).

3. High Research Computing Center in Moscow State University [Ofits. Sayt] http://hpc.msu.ru/?q=apps (Data Obrashcheniya 17.09.2017).

4. Luszczek P., Kurzak J., Dongarra J. Looking Back at Dense Linear Algebra Software. J. Parallel and Distributed Comp. 2014;4;7:2548—2560.

5. Spisok TOP500 [Ofits. Sayt] http://www.top500.org (Data Obrashcheniya 10.10.2017).

6. Basic Linear Algebra Subprograms BLAS [Ofits. Sayt] http://www.netlib.org/blas/ (Data Obrashcheniya 10.10.2017).

7. Matematicheskaya Biblioteka LAPACK [Ofits. Sayt] http://www.netlib.org/lapack (Data Obrashcheniya 10.10.2017).

8. Scalable Linear Algebra PACKage SCALAPACK [Ofits. Sayt] http://www.netlib.org/scalapack/ (Data Obrashcheniya 10.10.2017).

9. Matematicheskaya Biblioteka MKL [Ofits. Sayt] https://software.intel.com/en-us/intel-mkl (Data Obrashcheniya 10.10.2017).

10. Automatically Tuned Linear Algebra Software а ATLAS [Ofits. Sayt] http://math-atlas.sourceforge.net (Data Obrashcheniya 10.10.2017).

11. OpenMP [Ofits. Sayt] http://openmp.org (Data Obrashcheniya 15.10.2017).

12. MPI Forum [Ofits. Sayt] http://www.mpi-forum. org/docs/ (Data Obrashcheniya 15.10.2017).

13. Niklasson A.M.N. Density Matrix Methods in Linear Scaling Electronic Structure Theory. LinearScaling Techniques in Computational Chemistry and Phys. Springer Netherlands, 2011:439—473.

14. OpenACC [Ofits. Sayt] https://www.openacc.org/ (Data Obrashcheniya 15.10.2017).

15. Opisanie Standarta OpenCL [Ofits. Sayt] https:// www.khronos.org/opencl/ (Data Obrashcheniya 15.10.2017).

16. Dongarra J. e. a. Parallel Programming Models for Dense Linear Algebra on Heterogeneous System. Supercomputing Frontiers and Innovations. 2016;2;4: 67—86.

17. Opisanie Biblioteki GlobalArrays [Ofits. Sayt] http://hpc.pnl.gov/globalarrays/ (Data Obrashcheniya 15.10.2017).

18. PGAS [Ofits. Sayt] http://www.pgas.org/ (Data Obrashcheniya 24.10.2017).

19. Krishnan M., Nieplocha J. SRUMMA: a Matrix Multiplication Algorithm Suitable for Clusters and Scalable-Shared Memory Systems. Proc. 18 Intern. Symp. Parallel and Distributed Proc. 2004:70.

20. YarKhan A. e. a. Porting the PLASMA Numerical Library to the OpenMP Standard. Intern. J. Parallel Programming. 2017;45;3:612—633.

21. AMD Developer Central [Ofits. Sayt] http:// developer.amd.com/tools-and-sdks/archive/acml-archivedownloads/ (Data Obrashcheniya 10.10.2017).

22. High-Performance BLAS by Kazushige Goto [Ofits. Sayt] http://www.cs.utexas.edu/users/flame/goto/ signup_first.html (Data Obrashcheniyaя 10.10.2017).

23. GOTOBLAS2 [Ofits. Sayt] https://www.tacc. utexas.edu/research-development/tacc-software/gotoblas2 (Data Obrashcheniya 10.10.2017).

24. Dokumentatsiya Matematicheskoy Biblioteki OpenBLAS [Ofits. Sayt] https://github.com/xianyi/ OpenBLAS/wiki (Data Obrashcheniya 10.10.2017).

25. OpenBLAS [Ofits. Sayt] http://www.openblas.net/ (Data Obrashcheniya 10.10.2017).

26. Naveen G.V. Accelerating Deep Learning and Machine Learning to a New Level // CMG — Computer Measurement Group India [Elektron. Resurs] http://www. cmgindia.org/wp-content/uploads/2016/12/Intel_CMG_ India_Keynote_2016.pdf (Data Obrashcheniya 10.10.2017).

27. Intel Math Kernel Library [Elektron. Resurs] https://software.intel.com/sites/default/files/managed/ e0/9d/mkl-11.3.2-developer-reference-fortran_0.pdf (Data Obrashcheniya 10.10.2017).

28. Quintero D. e. a. High-Performance Computing Guide // IBM Redbooks [Elektron. Resurs] http://www. redbooks.ibm.com/redbooks/pdfs/sg10371.pdf (Data Obrashcheniya 01.09.2017).

29. cuBLAS [Elektron. Resurs] http://docs.nvidia.com/ cuda/cublas/index.html (Data Obrashcheniya10.10.2017).

30. Kurzak J. e. a. Designing SLATE: Software for Linear Algebra Targeting Exascale // Innovative Computing Laboratory [Elektron. Resurs] http://www. icl.utk.edu/files/publications/2017/icl-utk-980-2017.pdf (Data Obrashcheniya 15.10.2017).

31. Voevodin V.V., Voevodin V.V. Parallel'nye Vychisleniya. SPb.: BHV-Peterburg, 2002. (in Russian).

32. MAGMA [Elektron. Resurs] https://developer. nvidia.com/magma (Data Obrashcheniya 10.10.2017).

33. PLASMA [Elektron. Resurs] http://icl.cs.utk.edu/ plasma/ (Data Obrashcheniya10.10.2017).

34. Agullo E. e. a. Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA projects. J. Physics. 2009;180;1:012037.

35. MAGMA [Elektron. Resurs] http://icl.cs.utk.edu/ magma/ (Data Obrashcheniya 10.10.2017).

36. Intel Developer Zone [Ofits. Sayt] https://software. intel.com/sites (Data Obrashcheniya 10.10.2017).

37. Dongarra J. e. a. Accelerating Numerical Dense Linear Algebra Calculations with GPUs. Numerical Computations with GPUs. Springer Intern. Publ., 2014:3—28.

38. CLSPARSE [Elektron. Resurs] http://gpuopen. com/compute-product/clsparse/ (Data Obrashcheniya 10.10.2017).

39. Perechen' Matematicheskih Bibliotek [Elektron. Resurs] http://www.nvidia.ru/object/tesla-gpu-acceleratedlibraries-ru.html (Data Obrashcheniya 10.10.2017). (in Russian).

40. EISPACK [Elektron. Resurs] http://www.netlib. org/eispack/ (Data Obrashcheniya 15.10.2017).

41. PEIGS [Elektron. Resurs] http://hpc.pnl. gov/globalarrays/peigs.shtml (Data Obrashcheniya 15.10.2017).

42. Global Arrays [Elektron. Resurs] http://hpc.pnl. gov/globalarrays/ (Data Obrashcheniya 23.10.2017).

43. Kendall R. A. e. a. High Performance Computational Chemistry: an Overview of NWChem a Distributed Parallel Application. Computer Phys. Communications. 2000;128;1:260—283.

44. Sunderland A.G., Breitmoser E.Y. An Overview of Eigensolvers for HPCx. Complexity. 2003;2:3—12.

45. Hammarling S. New Developments in LAPACK and ScaLAPACK [Elektron. Resurs] http://www.maths. bath.ac.uk/~masrs/ma50177/pdfs/bath-25apr07.pdf (Data Obrashcheniya 15.10.2017).

46. Sparse Matrix Storage Formats for Sparse BLAS Routines [Elektron. Resurs] https://software.intel.com/ en-us/mkl-developer-reference-c-sparse-matrix-storageformats-for-sparse-blas-routines (Data Obrashcheniya 15.10.2017).

47. Introduction to the Intel MKL Extended Eigensolver [Elektron. Resurs] https://software.intel.com/ en-us/articles/introduction-to-the-intel-mkl-extended-eigensolver (Data Obrashcheniya 15.10.2017).

48. ELPA [Ofits. Sayt] https://elpa.mpcdf.mpg.de/ (Data Obrashcheniya 15.10.2017).

49. Marek A. The ELPA Library — Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science. J. Physics. 2014;26:213201.

50. Bock N., Challacombe M. An Optimized Sparse Approximate Matrix Multiply. CoRR. 2012:1—16.

51. Bock N., Challacombe M. An Optimized Sparse Approximate Matrix Multiply for Matrices With Decay. SIAM J. Scientific Comp. 2013;35;1:72—98.

52. Sparse BLAS [Elektron. Resurs] http://www. netlib.org/sparse-blas/ (Data Obrashcheniya 10.10.2017).

53. Borstnik U. e. a. Sparse Matrix Multiplication: the Distributed Block-compressed Sparse Row Library. Parallel Comp. 2014;40;5:47—58.

54. Greathouse J.L. e. a. clSPARSE: a Vendor optimized Open-source Sparse BLAS Library. Proc. IV Intern. Workshop on OpenCL. 2016:7.

55. Piccolo A., Soodla J. Performance of Parallel Sparse Matrix-matrixmultiplication // SCRIBD [Elektron. Resurs] http://www.diva-portal.org/smash/get/diva2:821031/ FULLTEXT01.pdf (Data Obrashcheniya 15.10.2017).

56. Rubensson E.H., Rudberg E. Locality-aware Parallel Block-sparse Matrix-matrix Multiplication Using the Chunks and Tasks Programming Model. Parallel Computing. Elsevier, 2016.

57. PARPACK [Elektron. Resurs] www.caam.rice. edu/~kristyn/parpack_home.html (Data Obrashcheniya 15.10.2017).

58. Heinecke A. e. a. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation. Proc. High Performance Computing, Networking, Storage and Analysi Intern. Conf. 2016:981—991.

59. Zhao Z. e. a. Performance of Hybrid MPI/OpenMP VASP on Cray XC40 Based on Intel Knights Landing Many Integrated Core Architecture [Elektron. Resurs] https:// cug.org/proceedings/cug2017_proceedings/includes/files/ pap134s2-file1.pdf (Data Obrashcheniya 10.10.2017).
---
For citation: Kruzhilov I.S., Kuzminsky M.B., Chernetsov A.M., Shamayeva O.Yu. Basic Linear Algebra Libraries for High Performance Computing. MPEI Vestnik. 2018;6:87—95. (in Russian). DOI: 10.24160/1993-6982-2018-6-87-95.

Published

2018-12-01

Issue

Section

Informatics, computer engineering and control (05.13.00)