使用CUBLAS查找最大值和最小值

Question

I'm having problems grasping why my function that finds maximum and minimum in a range of doubles using CUBLAS doesn't work properly. 我在掌握为什么我的函数在使用CUBLAS的双倍范围内找到最大值和最小值时无法正常工作时遇到问题。

The code is as follows: 代码如下：

void findMaxAndMinGPU(double* values, int* max_idx, int* min_idx, int n)
{
    double* d_values;
    cublasHandle_t handle;
    cublasStatus_t stat;
    safecall( cudaMalloc((void**) &d_values, sizeof(double) * n), "cudaMalloc     (d_values) in findMaxAndMinGPU");
    safecall( cudaMemcpy(d_values, values, sizeof(double) * n, cudaMemcpyHostToDevice), "cudaMemcpy (h_values > d_values) in findMaxAndMinGPU");
    cublasCreate(&handle);

    stat = cublasIdamax(handle, n, d_values, sizeof(double), max_idx);
    if (stat != CUBLAS_STATUS_SUCCESS)
        printf("Max failed\n");

    stat = cublasIdamin(handle, n, d_values, sizeof(double), min_idx);
    if (stat != CUBLAS_STATUS_SUCCESS)
        printf("min failed\n");

    cudaFree(d_values);
    cublasDestroy(handle);
}

Where values is the values to search within. 其中值是要在其中搜索的值。 The max_idx and min_idx are the index of the found numbers in values. max_idx和min_idx是值中找到的数字的索引。 The results from the CUBLAS-calls seems rather random and output wrong indexes. 来自CUBLAS调用的结果似乎相当随机并输出错误的索引。

Anyone with a golly good answer to my problem? 任何人对我的问题都有很好的答案吗？ I am a tad sad at the moment :( 我此刻有点伤心:(

Answer 1

One of your arguments to both the cublasIdamax and cublasIdamin calls are wrong. 你对cublasIdamax和cublasIdamin调用的一个论点是错误的。 The incx argument in BLAS level 1 calls should always be the stride of the input in words, not bytes. BLAS 1级调用中的incx参数应该始终是单词输入的步幅，而不是字节。 So I suspect that you want something more like: 所以我怀疑你想要更像的东西：

stat = cublasIdamax(handle, n, d_values, 1, max_idx);
if (stat != CUBLAS_STATUS_SUCCESS)
    printf("Max failed\n");

stat = cublasIdamin(handle, n, d_values, 1, min_idx);
if (stat != CUBLAS_STATUS_SUCCESS)
    printf("min failed\n");

By using sizeof(double) you are telling the routines to use a stride of 8, which will have the calls overrun the allocated storage of the input array and into uninitialised memory. 通过使用sizeof(double)您可以告诉例程使用8的步长，这将使调用超出输入数组的已分配存储并进入未初始化的内存。 I presume you actually have a stride of 1 in d_values . 我认为你实际上在d_values有一个1的d_values 。

Edit: Here is a complete runnable example which works correctly. 编辑：这是一个完整可运行的完整示例。 Note I switched the code to single precision because I don't presently have access to double precision capable hardware: 注意我将代码切换到单精度，因为我目前无法访问具有双精度功能的硬件：

#include <cuda_runtime.h>
#include <cublas_v2.h>
#include <cstdio>
#include <cstdlib>
#include <sys/time.h>


typedef float Real;

void findMaxAndMinGPU(Real* values, int* max_idx, int* min_idx, int n)
{
    Real* d_values;
    cublasHandle_t handle;
    cublasStatus_t stat;
    cudaMalloc((void**) &d_values, sizeof(Real) * n);
    cudaMemcpy(d_values, values, sizeof(Real) * n, cudaMemcpyHostToDevice);
    cublasCreate(&handle);

    stat = cublasIsamax(handle, n, d_values, 1, max_idx);
    if (stat != CUBLAS_STATUS_SUCCESS)
        printf("Max failed\n");

    stat = cublasIsamin(handle, n, d_values, 1, min_idx);
    if (stat != CUBLAS_STATUS_SUCCESS)
        printf("min failed\n");

    cudaFree(d_values);
    cublasDestroy(handle);
}

int main(void)
{
    const int vmax=1000, nvals=10000;

    float vals[nvals];
    srand ( time(NULL) );
    for(int j=0; j<nvals; j++) {
       vals[j] = float(rand() % vmax);
    }

    int minIdx, maxIdx;
    findMaxAndMinGPU(vals, &maxIdx, &minIdx, nvals);

    int cmin = 0, cmax=0;
    for(int i=1; i<nvals; i++) {
        cmin = (vals[i] < vals[cmin]) ? i : cmin;
        cmax = (vals[i] > vals[cmax]) ? i : cmax;
    }

    fprintf(stdout, "%d %d %d %d\n", minIdx, cmin, maxIdx, cmax);

    return 0;
}

which when compiled and run gives this: 在编译和运行时给出：

$ g++ -I/usr/local/cuda/include -L/usr/local/cuda/lib cublastest.cc -lcudart -lcublas
$ ./a.out
273 272 85 84

note that CUBLAS follows the FORTRAN convention and uses 1 indexing, rather than zero indexing, which is why there is a difference of 1 between the CUBLAS and CPU versions. 请注意，CUBLAS遵循FORTRAN约定并使用1索引，而不是零索引，这就是为什么CUBLAS和CPU版本之间存在1的差异。

Answer 2

from description: The element of the maximum magnitude: http://docs.nvidia.com/cuda/cublas/index.html#topic_6_1 来自描述：最大幅度的元素： http ： //docs.nvidia.com/cuda/cublas/index.html#topic_6_1

if you have { 1, 2, 3, -33, 22, 11 }

result will be 4! 结果将是4！ not 5 不是5

abs(-33) > 22

使用CUBLAS查找最大值和最小值

问题描述

2 个解决方案

解决方案1
6 已采纳 2012-04-25 09:21:42

解决方案2
2 2013-05-07 16:13:02

使用CUBLAS查找最大值和最小值

问题描述

2 个解决方案

解决方案1 6 已采纳 2012-04-25 09:21:42

解决方案2 2 2013-05-07 16:13:02

解决方案1
6 已采纳 2012-04-25 09:21:42

解决方案2
2 2013-05-07 16:13:02