CUDA，使用memset（或fill或...）设置float到max val的数组

Question

Edit: Thanks for the previous answers. 编辑：感谢您以前的答案。 but in fact I want to do it in CUDA, and apparently there is no function Fill for CUDA. 但事实上我想在CUDA中做到这一点，显然CUDA没有功能填充。 I have to fill the matrix once for each thread so I want to make sure I'm using the fastest way possible. 我必须为每个线程填充一次矩阵，所以我想确保我使用最快的方式。 Is this for loop my best choice? 这个for循环是我最好的选择吗？

I want to set the matrix of float to the maximum value possible (in float). 我想将float的矩阵设置为可能的最大值（在float中）。 What is the correct way of doing this job? 做这份工作的正确方法是什么？

float *matrix=new float[N*N];

for (int i=0;i<N*N;i++){
        matrix[i*N+j]=999999;
}

Thanks in advance. 提前致谢。

Answer 1

The easiest approach in CUDA is to use thrust::fill . CUDA中最简单的方法是使用thrust :: fill 。 Thrust is included with CUDA 4.0 and later, or you can install it if you are using CUDA 3.2. CUDA 4.0及更高版本中包含了Thrust ，如果您使用的是CUDA 3.2 ，则可以安装它。

#include <thrust/fill.h>
#include <thrust/device_vector.h>
...
thrust::device_vector<float> v(N*N);
thrust::fill(v.begin(), v.end(), std::numeric_limits<float>::max()); // or 999999.f if you prefer

You could also write pure CUDA code something like this: 你也可以写这样的纯CUDA代码：

template <typename T>
__global__ void initMatrix(T *matrix, int width, int height, T val) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;

    for (int i = idx; i < width * height; i += gridDim.x * blockDim.x) {
        matrix[i]=val;
    }
}

int main(void) {
    float *matrix = 0;
    cudaMalloc((void*)&matrix, N*N * sizeof(float));

    int blockSize = 256;
    int numBlocks = (N*N + blockSize - 1) / (N*N);
    initMatrix<<<numBlocks, blockSize>>>(matrix, N, N, 
                                         std::numeric_limits<float>::max()); // or 999999.f if you prefer
}

Answer 2

You need to iterate through the array and set each float element to std::numeric_limits<float>::max() in limits ... you can't use memset for this since it sets every byte in a memory buffer, not a multi-byte value like a float, etc., to a specific value. 您需要通过数组迭代，并设置每个float元素std::numeric_limits<float>::max()在limits ...你不能使用memset ，因为它在内存缓冲区设置的每一个字节，而不是为这个像浮点数等多字节值到特定值。

So you would end up with code that looks like the following since you're only using a single array for your matrix (ie, you don't need the second for-loop): 因此，您最终会得到如下所示的代码，因为您只为矩阵使用单个数组（即，您不需要第二个for循环）：

#include <limits>

float* matrix = new float[N*N];

for (int i=0; i < N*N; i++)
{
    matrix[i] = std::numeric_limits<float>::max();
}

The second huge problem with your request is that memset takes an integral-type for the value to set each byte to, so you'd have to get the actual bit-pattern of the max floating point value, and use that as the input to memset . 您的请求的第二个巨大问题是memset采用整数类型来设置每个字节的值，因此您必须获取最大浮点值的实际位模式，并将其用作输入memset 。 But even that won't work since memset can only set each byte in a memory buffer to a given value, therefore if you pass a 32-bit integral value representing a floating point value to memset , it's only going to use the lower 8-bits ... so in the end it's not just something we're not advising you to-do, but it's impossible for the way that memset has been implemented. 但即使这样也行不通，因为memset只能将内存缓冲区中的每个字节设置为给定值，因此如果将表示浮点值的32位整数值传递给memset ，它只会使用较低的8-比特......所以最后它不仅仅是我们不建议你做的事情，但是对于memset实现方式来说也是不可能的。 You simply can't use memset to initialize a memory buffer of multi-byte types to a specific value unless you are wanting to zero-out the values, or you are doing some odd hack that lets you write the same value to all the bytes that compose a multi-byte data-type. 您根本无法使用memset将多字节类型的内存缓冲区初始化为特定值，除非您想要将值清零，或者您正在做一些奇怪的黑客攻击，它允许您将相同的值写入所有字节组成一个多字节数据类型。

Answer 3

Use std::numeric_limits<float>::max() and std::fill as: 使用std::numeric_limits<float>::max()和std::fill作为：

#include <limits>     //for std::numeric_limits<> 
#include <algorithm>  //for std::fill

std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());

Or, std::fill_n as (looks better): 或者， std::fill_n as（看起来更好）：

std::fill_n(matrix, N*N, std::numeric_limits<float>::max());

See these online documentation: 请参阅以下在线文档：

std::fill 的std ::补
std::fill_n 的std :: fill_n

Answer 4

我建议轻松完成这项工作，使用std :: fill代替算法标题。

std::fill( matrix, matrix + (N*N), 999999 ) ;

Answer 5

Instead of using dynamic memory in C++, use vector and watch it do all the work for you: 而不是在C ++中使用动态内存，使用vector并观察它为您完成所有工作：

std::vector<float> matrix(N * N, std::numeric_limits<float>::max());

In fact you can even make it a 2d matrix easily: 事实上，你甚至可以轻松地将它变为二维矩阵：

std::vector<std::vector<float> > matrix(N, std::vector<float>(N, std::numeric_limits<float>::max()));

Answer 6

C ++方式：

std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());

Answer 7

Is matrix global memory or thread local memory? matrix全局内存还是线程本地内存？ If it is in global memory, and you only need to initialize (rather than a reset in the middle of a kernel), then you can use memset from the host before launching the kernel. 如果它在全局内存中，并且您只需要初始化（而不是在内核中间重置），那么您可以在启动内核之前使用来自主机的memset。 If it is in the middle of the kernel, consider breaking the kernel into two pieces so you can still use cudaMemset. 如果它位于内核的中间，请考虑将内核分成两部分，这样您仍然可以使用cudaMemset。

cudaMemset(matrix,std::numeric_limits<float>::max(),N*N*blockSize);

CUDA，使用memset（或fill或...）设置float到max val的数组

问题描述

7 个解决方案

解决方案1
17 2011-07-27 00:50:54

解决方案2
4 2011-07-26 20:12:15

解决方案3
3 已采纳 2011-07-26 20:15:55

解决方案4
2 2011-07-26 20:14:13

解决方案5
2 2011-07-26 20:44:12

解决方案6
1 2011-07-26 20:15:06

解决方案7
1 2011-07-28 13:48:37

CUDA，使用memset（或fill或...）设置float到max val的数组

问题描述

7 个解决方案

解决方案1 17 2011-07-27 00:50:54

解决方案2 4 2011-07-26 20:12:15

解决方案3 3 已采纳 2011-07-26 20:15:55

解决方案4 2 2011-07-26 20:14:13

解决方案5 2 2011-07-26 20:44:12

解决方案6 1 2011-07-26 20:15:06

解决方案7 1 2011-07-28 13:48:37

解决方案1
17 2011-07-27 00:50:54

解决方案2
4 2011-07-26 20:12:15

解决方案3
3 已采纳 2011-07-26 20:15:55

解决方案4
2 2011-07-26 20:14:13

解决方案5
2 2011-07-26 20:44:12

解决方案6
1 2011-07-26 20:15:06

解决方案7
1 2011-07-28 13:48:37