简体繁体中英

Optimized CUDA matrix hamming distance

原文 2016-07-09 00:56:41 9 1 c++/ c/ matrix/ cuda/ hamming-distance

Is anyone aware of an optimized CUDA kernel for computing a GEMM style hamming distance between two matrices of dimension A x N and N x B? The problem is nearly identical to GEMM, but instead computes the sum( a_n != b_n ) for each vector {1 ... N}, instead of multiplying and summing each vector element.

I wanted to verify before writing my own, since this problem is relatively common, but I haven't had success in finding code for it yet. Suggestions for code to modify would be excellent as well.

EDIT:

In addition to kangshiyin's suggestions below, I found this walk-through of an optimized SGEMM implementation to be extraordinarily helpful in understanding steps beyond the basic shared memory matrix multiplication example in the CUDA C Programming Guide.

1 answers

You are right that you could write your kernel by modifying gemm() code. CUDA examples have a simple implementation of gemm() , but it is too simple. The performance is bounded by shared memory access, giving only ~250 Gflops on Kepler devices. For higher performance, you may want to check the gemm() code in MAGMA.

http://icl.cs.utk.edu/magma/index.html

These two papers also tell you how to implement and tune gemm() .

http://staff.kfupm.edu.sa/ics/ahkhan/Resources/Papers/Autotuning/Autotuning%2520GEMM%2520Kernels%2520for%2520the%2520Fermi%2520GPU.pdf

http://www.netlib.org/lapack/lawnspdf/lawn267.pdf

Unlike gemm() which has hardware support with the FMA instruction for fast multiply-and-add operation, your desired operation compare-and-add may need more instructions, thus the performance should be lower. Considering the peak performance of gemm() is ~3 Tflops on Kepler. You may be able to get 0.5~2 Tflops for hamming distance matrix calculation.

Hamming Distance: Incorrect Count

Object Detection with Hamming distance

Hamming Distance Intuition

Finding Hamming Numbers - not code or distance

Fast hamming distance between 2 bitset

Cuda, calculate distance matrix between 3d objects

Generate all strings within a given Hamming distance

How to count the hamming distance of two short int?

Hamming distance through Eigen and std::bitset

Fast Popcount instruction or Hamming distance for binary array?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Hamming Distance: Incorrect Count Object Detection with Hamming distance Hamming Distance Intuition Finding Hamming Numbers - not code or distance Fast hamming distance between 2 bitset Cuda, calculate distance matrix between 3d objects Generate all strings within a given Hamming distance How to count the hamming distance of two short int? Hamming distance through Eigen and std::bitset Fast Popcount instruction or Hamming distance for binary array?

Related Tags

Optimized CUDA matrix hamming distance

Question

1 answers

solution1 3 ACCPTED 2016-07-09 04:47:36

solution1
3 ACCPTED 2016-07-09 04:47:36