I am using MATLAB R2017a. I am running a simple code to calculate cumulative sum from the first point until ith point.
my CUDA kernel code is:
__global__ void summ(const double *A, double *B, int N){
for (int i=threadIdx.x; i<N; i++){
B[i+1] = B[i] + A[i];}}
my MATLAB code is
k=parallel.gpu.CUDAKernel('summ.ptx','summ.cu');
n=10^7;
A=rand(n,1);
ans=zeros(n,1);
A1=gpuArray(A);
ans2=gpuArray(ans);
k.ThreadBlockSize = [1024,1,1];
k.GridSize = [3,1];
G = feval(k,A1,ans2,n);
G1 = gather(G);
GPU_time = toc
I am wondering why the GPU time increasing when i increase the grid size (k,.GridSize). for instant for 10^7 data,
k.GridSize=[1,1] the time is 8.0748s
k.GridSize=[2,1] the time is 8.0792s
k.GridSize=[3,1] the time is 8.0928s
From what i understand, for 10^7 number of data, the system will need 10^7 / 1024 ~ 9767 blocks, so the grid size should be [9767,1].
The GPU device is
Name: 'Tesla K20c'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 9.1000
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.2983e+09
AvailableMemory: 4.9132e+09
MultiprocessorCount: 13
ClockRateKHz: 705500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
thank you for your response.
You appear to be worrying about a very very small portion of the time compared to the overall effect. The real question you should be asking is: does this amount of time to solve this problem make sense? The answer to that is no absolutely not.
Here is a modified code which should run much faster
n=10^7;
dev = gpuDevice;
A = randn(n,1,'gpuArray');
B = randn(n,1,'gpuArray');
tic
G = A+cumsum(B);
wait(dev)
toc
On my 1060 this runs in 0.03 seconds. For even faster speeds you can use single precision
At any rate, that 0.02 seconds could be easily attributable to small changes in loads on your GPU. It's a much more likely scenario than having to do with gridsizes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.