What are BLAS and LAPACK

I wanted to write about this subject since I started reading about NMatrix.

At the beginning, the names BLAS, LAPACK and ATLAS confused me — imagine a young programmer, without formal training, trying to understand what’s a “de facto application programming interface standard” with lots of strangely-named functions and some references to the ancient FORTRAN language.

As of now, I think my understanding is sufficient to write about them.

Meaning, definitions and madness

BLAS (Basic Linear Algebra Subroutine) is a standard that provides 3 levels of functions for different kinds of linear algebra operations. Consider \alpha and \beta as scalars, x and y as vectors and A, B and T (triangular) as matrices. The levels are divided in the following way:

  1. Scalar and vector operations of the form y = \alpha * x + y, dot product and vector norms.
  2. Matrix-vector operations of the form y = \alpha * A * x + \beta * y and solving T * x = y.
  3. Matrix-matrix operations of the form C = \alpha * A * B + \beta * C and solving B = \alpha * T^{-1} * B. GEMM (GEneral Matrix Multiply) is contained in this level.

There are several functions in LAPACK (Linear Algebra PACKage), from solving linear systems to eigenvalues and factorizations. It’s much better to take a look at its documentation when you’re looking for something specific.

A bit of history

BLAS was first published in 1979, as can be seen in this paper. An interesting part of it is the section named Reasons for Developing the Package:

  1. It can serve as a conceptual aid in both the design and coding stages of a programming effort to regard an operation such as the dot product as a basic building block.

  2. It improves the self-documenting quality of code to identify an operation such as the dot product by a unique mnemonic name.

  3. Since a significant amount of the execution time in complicated linear algebraic programs may be spent in a few low level operations, a reduction of the execution time spent in these operations may be reflected in cost savings in the running of programs. Assembly language coded subprograms for these operations provide such savings on some computers.

  4. The programming of some of these low level operations involves algorithmic and implementation subtleties that are likely to be ignored in the typical applications programming environment. For example, the subprograms provided for the modified Givens transformation incorporate control of the scaling terms, which otherwise can drift monotonically toward underflow.

So it seems we still use BLAS for the reasons it was created. The paper’s a pretty good read if you have the time. (and if you don’t know what’s a Givens transformation, read this)

LAPACK was first published in 1992, as can be seen in the release history. By reading the LAWNs (LAPACK Working Notes), we can get a pretty good picture of its beginning, e.g. papers that presented techniques which were later added to it and installation notes (with sayings of the sort “[…] by sending the authors a hard copy of the output files or by returning the distribution tape with the output files stored on it”).

Implementations

There are various implementations of the BLAS API, e.g. by Intel, AMD, Apple and the GNU Scientific Library. The one supported by NMatrix is ATLAS (Automatically Tuned Linear Algebra Software), a very cool project that uses a lot of heuristics to determine optimal compilation parameters to maximize its BLAS & LAPACK implementations’ performance.

As for LAPACK, its original goal was “to make the widely used EISPACK and LINPACK libraries run efficiently on shared-memory vector and parallel processors” (source). Simply put, it’s a library for speeding up various matrix-related routines by taking advantage of each architecture’s memory hierarchy. The trick is that it uses block algorithms for dealing with matrices instead of an element-by-element approach. This way, less time is spent moving data around. It’s written in Fortran 90.

Another important point regarding LAPACK is that it requires a good BLAS implementation — it assumes there’s one already made for the system at hand — by calling the level 3 operations as much as possible.

Function naming conventions

One of the strangest things about BLAS and LAPACK is how their functions are named. In LAPACK, a subroutine name is of the form pmmaaa, where:

  • p is the type of the numbers used, e.g. S for single-precision floating-point and Z for double-precision complex.
  • mm is the kind of matrix used in the algorithm, e.g. GE for GEneral matrices, SY for SYmmetric and TB for Triangular Band.
  • aaa is the algorithm implemented by the subroutine, e.g. QRF for QR factorization, TRS for solving linear equations with factorization.

BLAS functions are named as <character><name><mod>, which, although similar to LAPACK’s, have differences depending on the specific level. In level 1, <name> is the operation type, while is level 2 and 3 it’s the matrix argument type. For each level, there are some specific values that <mod> (if present) can take, each providing additional information of the operation. <character> is the data type regardless of the level.

These arcane names are derived from the fact that FORTRAN’s identifiers were limited to 6 characters in length. This was solved by FORTRAN90 by allowing up to 31 characters, but the names used in BLAS and LAPACK remain to this day.

Use in NMatrix

NMatrix has bindings to both BLAS and LAPACK. Let me show you:

If you want to take a look at the low-level bindings, grab some coffee and read the ext/nmatrix/math/ directory. Since 8f129f, it has been greatly simplified and can actually be understood.

References

Below you can find a list of the main resources used in this post.

Leave a Reply