[英]CUDA, using memset(or fill or …) to set an array of float to max val possible
Edit: Thanks for the previous answers. 编辑:感谢您以前的答案。 but in fact I want to do it in CUDA, and apparently there is no function Fill for CUDA.
但事实上我想在CUDA中做到这一点,显然CUDA没有功能填充。 I have to fill the matrix once for each thread so I want to make sure I'm using the fastest way possible.
我必须为每个线程填充一次矩阵,所以我想确保我使用最快的方式。 Is this for loop my best choice?
这个for循环是我最好的选择吗?
I want to set the matrix of float to the maximum value possible (in float). 我想将float的矩阵设置为可能的最大值(在float中)。 What is the correct way of doing this job?
做这份工作的正确方法是什么?
float *matrix=new float[N*N];
for (int i=0;i<N*N;i++){
matrix[i*N+j]=999999;
}
Thanks in advance. 提前致谢。
The easiest approach in CUDA is to use thrust::fill . CUDA中最简单的方法是使用thrust :: fill 。 Thrust is included with CUDA 4.0 and later, or you can install it if you are using CUDA 3.2.
CUDA 4.0及更高版本中包含了Thrust ,如果您使用的是CUDA 3.2 ,则可以安装它 。
#include <thrust/fill.h>
#include <thrust/device_vector.h>
...
thrust::device_vector<float> v(N*N);
thrust::fill(v.begin(), v.end(), std::numeric_limits<float>::max()); // or 999999.f if you prefer
You could also write pure CUDA code something like this: 你也可以写这样的纯CUDA代码:
template <typename T>
__global__ void initMatrix(T *matrix, int width, int height, T val) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
for (int i = idx; i < width * height; i += gridDim.x * blockDim.x) {
matrix[i]=val;
}
}
int main(void) {
float *matrix = 0;
cudaMalloc((void*)&matrix, N*N * sizeof(float));
int blockSize = 256;
int numBlocks = (N*N + blockSize - 1) / (N*N);
initMatrix<<<numBlocks, blockSize>>>(matrix, N, N,
std::numeric_limits<float>::max()); // or 999999.f if you prefer
}
You need to iterate through the array and set each float
element to std::numeric_limits<float>::max()
in limits
... you can't use memset
for this since it sets every byte in a memory buffer, not a multi-byte value like a float, etc., to a specific value. 您需要通过数组迭代,并设置每个
float
元素std::numeric_limits<float>::max()
在limits
...你不能使用memset
,因为它在内存缓冲区设置的每一个字节 ,而不是为这个像浮点数等多字节值到特定值。
So you would end up with code that looks like the following since you're only using a single array for your matrix (ie, you don't need the second for-loop): 因此,您最终会得到如下所示的代码,因为您只为矩阵使用单个数组(即,您不需要第二个for循环):
#include <limits>
float* matrix = new float[N*N];
for (int i=0; i < N*N; i++)
{
matrix[i] = std::numeric_limits<float>::max();
}
The second huge problem with your request is that memset
takes an integral-type for the value to set each byte to, so you'd have to get the actual bit-pattern of the max floating point value, and use that as the input to memset
. 您的请求的第二个巨大问题是
memset
采用整数类型来设置每个字节的值,因此您必须获取最大浮点值的实际位模式,并将其用作输入memset
。 But even that won't work since memset
can only set each byte in a memory buffer to a given value, therefore if you pass a 32-bit integral value representing a floating point value to memset
, it's only going to use the lower 8-bits ... so in the end it's not just something we're not advising you to-do, but it's impossible for the way that memset
has been implemented. 但即使这样也行不通,因为
memset
只能将内存缓冲区中的每个字节设置为给定值,因此如果将表示浮点值的32位整数值传递给memset
,它只会使用较低的8-比特......所以最后它不仅仅是我们不建议你做的事情,但是对于memset
实现方式来说也是不可能的。 You simply can't use memset
to initialize a memory buffer of multi-byte types to a specific value unless you are wanting to zero-out the values, or you are doing some odd hack that lets you write the same value to all the bytes that compose a multi-byte data-type. 您根本无法使用
memset
将多字节类型的内存缓冲区初始化为特定值,除非您想要将值清零,或者您正在做一些奇怪的黑客攻击,它允许您将相同的值写入所有字节组成一个多字节数据类型。
Use std::numeric_limits<float>::max()
and std::fill
as: 使用
std::numeric_limits<float>::max()
和std::fill
作为:
#include <limits> //for std::numeric_limits<>
#include <algorithm> //for std::fill
std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());
Or, std::fill_n
as (looks better): 或者,
std::fill_n
as(看起来更好):
std::fill_n(matrix, N*N, std::numeric_limits<float>::max());
See these online documentation: 请参阅以下在线文档:
我建议轻松完成这项工作,使用std :: fill代替算法标题。
std::fill( matrix, matrix + (N*N), 999999 ) ;
Instead of using dynamic memory in C++, use vector
and watch it do all the work for you: 而不是在C ++中使用动态内存,使用
vector
并观察它为您完成所有工作:
std::vector<float> matrix(N * N, std::numeric_limits<float>::max());
In fact you can even make it a 2d matrix easily: 事实上,你甚至可以轻松地将它变为二维矩阵:
std::vector<std::vector<float> > matrix(N, std::vector<float>(N, std::numeric_limits<float>::max()));
C ++方式:
std::fill(matrix, matrix + N*N, std::numeric_limits<float>::max());
Is matrix
global memory or thread local memory? matrix
全局内存还是线程本地内存? If it is in global memory, and you only need to initialize (rather than a reset in the middle of a kernel), then you can use memset from the host before launching the kernel. 如果它在全局内存中,并且您只需要初始化(而不是在内核中间重置),那么您可以在启动内核之前使用来自主机的memset。 If it is in the middle of the kernel, consider breaking the kernel into two pieces so you can still use cudaMemset.
如果它位于内核的中间,请考虑将内核分成两部分,这样您仍然可以使用cudaMemset。
cudaMemset(matrix,std::numeric_limits<float>::max(),N*N*blockSize);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.