简体   繁体   English

CUDA,使用memset(或fill或...)设置float到max val的数组

[英]CUDA, using memset(or fill or …) to set an array of float to max val possible

Edit: Thanks for the previous answers. 编辑:感谢您以前的答案。 but in fact I want to do it in CUDA, and apparently there is no function Fill for CUDA. 但事实上我想在CUDA中做到这一点,显然CUDA没有功能填充。 I have to fill the matrix once for each thread so I want to make sure I'm using the fastest way possible. 我必须为每个线程填充一次矩阵,所以我想确保我使用最快的方式。 Is this for loop my best choice? 这个for循环是我最好的选择吗?

I want to set the matrix of float to the maximum value possible (in float). 我想将float的矩阵设置为可能的最大值(在float中)。 What is the correct way of doing this job? 做这份工作的正确方法是什么?

float *matrix=new float[N*N];

for (int i=0;i<N*N;i++){
        matrix[i*N+j]=999999;
}

Thanks in advance. 提前致谢。

The easiest approach in CUDA is to use thrust::fill . CUDA中最简单的方法是使用thrust :: fill Thrust is included with CUDA 4.0 and later, or you can install it if you are using CUDA 3.2. CUDA 4.0及更高版本中包含了Thrust 如果您使用的是CUDA 3.2 则可以安装它

#include <thrust/fill.h>
#include <thrust/device_vector.h>
...
thrust::device_vector<float> v(N*N);
thrust::fill(v.begin(), v.end(), std::numeric_limits<float>::max()); // or 999999.f if you prefer

You could also write pure CUDA code something like this: 你也可以写这样的纯CUDA代码:

template <typename T>
__global__ void initMatrix(T *matrix, int width, int height, T val) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;

    for (int i = idx; i < width * height; i += gridDim.x * blockDim.x) {
        matrix[i]=val;
    }
}

int main(void) {
    float *matrix = 0;
    cudaMalloc((void*)&matrix, N*N * sizeof(float));

    int blockSize = 256;
    int numBlocks = (N*N + blockSize - 1) / (N*N);
    initMatrix<<<numBlocks, blockSize>>>(matrix, N, N, 
                                         std::numeric_limits<float>::max()); // or 999999.f if you prefer
}

You need to iterate through the array and set each float element to std::numeric_limits<float>::max() in limits ... you can't use memset for this since it sets every byte in a memory buffer, not a multi-byte value like a float, etc., to a specific value. 您需要通过数组迭代,并设置每个float元素std::numeric_limits<float>::max()limits ...你不能使用memset ,因为它在内存缓冲区设置每一个字节 ,而不是为这个像浮点数等多字节值到特定值。

So you would end up with code that looks like the following since you're only using a single array for your matrix (ie, you don't need the second for-loop): 因此,您最终会得到如下所示的代码,因为您只为矩阵使用单个数组(即,您不需要第二个for循环):

#include <limits>

float* matrix = new float[N*N];

for (int i=0; i < N*N; i++)
{
    matrix[i] = std::numeric_limits<float>::max();
}

The second huge problem with your request is that memset takes an integral-type for the value to set each byte to, so you'd have to get the actual bit-pattern of the max floating point value, and use that as the input to memset . 您的请求的第二个巨大问题是memset采用整数类型来设置每个字节的值,因此您必须获取最大浮点值的实际位模式,并将其用作输入memset But even that won't work since memset can only set each byte in a memory buffer to a given value, therefore if you pass a 32-bit integral value representing a floating point value to memset , it's only going to use the lower 8-bits ... so in the end it's not just something we're not advising you to-do, but it's impossible for the way that memset has been implemented. 但即使这样也行不通,因为memset只能将内存缓冲区中的每个字节设置为给定值,因此如果将表示浮点值的32位整数值传递给memset ,它只会使用较低的8-比特......所以最后它不仅仅是我们不建议你做的事情,但是对于memset实现方式来说也是不可能的。 You simply can't use memset to initialize a memory buffer of multi-byte types to a specific value unless you are wanting to zero-out the values, or you are doing some odd hack that lets you write the same value to all the bytes that compose a multi-byte data-type. 您根本无法使用memset将多字节类型的内存缓冲区初始化为特定值,除非您想要将值清零,或者您正在做一些奇怪的黑客攻击,它允许您将相同的值写入所有字节组成一个多字节数据类型。

Use std::numeric_limits<float>::max() and std::fill as: 使用std::numeric_limits<float>::max()std::fill作为:

#include <limits>     //for std::numeric_limits<> 
#include <algorithm>  //for std::fill

std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());

Or, std::fill_n as (looks better): 或者, std::fill_n as(看起来更好):

std::fill_n(matrix, N*N, std::numeric_limits<float>::max());

See these online documentation: 请参阅以下在线文档:

我建议轻松完成这项工作,使用std :: fill代替算法标题。

std::fill( matrix, matrix + (N*N), 999999 ) ;

Instead of using dynamic memory in C++, use vector and watch it do all the work for you: 而不是在C ++中使用动态内存,使用vector并观察它为您完成所有工作:

std::vector<float> matrix(N * N, std::numeric_limits<float>::max());

In fact you can even make it a 2d matrix easily: 事实上,你甚至可以轻松地将它变为二维矩阵:

std::vector<std::vector<float> > matrix(N, std::vector<float>(N, std::numeric_limits<float>::max()));

C ++方式:

std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());

Is matrix global memory or thread local memory? matrix全局内存还是线程本地内存? If it is in global memory, and you only need to initialize (rather than a reset in the middle of a kernel), then you can use memset from the host before launching the kernel. 如果它在全局内存中,并且您只需要初始化(而不是在内核中间重置),那么您可以在启动内核之前使用来自主机的memset。 If it is in the middle of the kernel, consider breaking the kernel into two pieces so you can still use cudaMemset. 如果它位于内核的中间,请考虑将内核分成两部分,这样您仍然可以使用cudaMemset。

cudaMemset(matrix,std::numeric_limits<float>::max(),N*N*blockSize);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM