CuBlas matrix multiplication with C-style arraysPosted on 17 June 2013
CuBlas has decently optimized calls, but it stuck with column-first indexing, which makes it mind-bogglingly annoying to use in C code. Refusing to switch to Fortran-style indexing, I spent some time figuring which parameter should be what, and which matrix should be transposed and which one should not be.
Suppose you want to calculate a plain old-school matrix product, where there is nothing fancy about row lengths, the second row start right on the spot where row length indicates it should. The two matrices are D and E, the product is F=DE. None of the matrices exhibit any kind of special structure, they are not symmetric or Hermitian, they are definitely not square, and the elements are single-precision floats. You would like to make the following function call:
cublasSgemm(cublasHandle_t handle, cublasOperation_t transa, cublasOperation_t transb, int m, int n, int k, const float *alpha, const float *A, int lda, const float *B, int ldb, const float *beta, float *C, int ldc)
Some parameters are easier than others. The handle is just the usual CuBlas handle which is better initialized somewhere before the call. Parameter alpha will be one, and beta zero. The rest of the parameters are trickier.
This is how the matrix D look like in memory with C-style indexing:
This is how CuBlas sees them:
The important thing to notice that they do not need to be transposed to
get the correct result, flipping their order will be sufficient. So DE
C-style will be ED Fortran-style. With this, we can establish five more
transa will be
CUBLAS_OP_N (=not transposed),
CUBLAS_OP_N, A will be E, B will be D, and the result C will be F.
m is the number of rows of matrices A and C. This will not correspond to
our row numbers because we flipped the order of multiplication. So this
value will be in fact the number of columns of E (denoted by
colE). The parameter
the number of columns of matrices B and C, which in our case will
rowD. The parameter
k is the number of columns of A and rows of B, in
our case, it will be colD.
Perhaps the hardest to figure out is the leading dimensions of the
ldc. In Fortran, this would be the distance in
memory between two consecutive columns of the respective matrices.
Since we have C-style arrays, this will be the distance between two
consecutive rows, which is, of course, the number of columns. So the
ldc will become
We would be fools not to make use of the excellent Thrust library in the rest of our code, so let us further assume that the matrices are stored as Thrust vectors, so they need to be cast in raw-pointer forms for the CuBlas call. Putting it all together, the call will be:
float alpha = 1.0f; float beta = 0.0f; status = cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, colE, rowD, colD, &alpha, thrust::raw_pointer_cast(&E), colE, thrust::raw_pointer_cast(&D), colD, &beta, thrust::raw_pointer_cast(&F), colE);
of code shows a working example. It should compile with
matrix_product_c_view matrix_product_c_view.cu -lcublas.