CUDA doesn't work as expected?

Question

I have programmed CUDA code.

unsigned long mask_buffer;
int s;
off_t p,

for(p=0;p!=5000;p++)
{
    for(s=start;s!=end;s++)
    {
        ref_off = *(((unsigned int*)(idx_base)) + p);

        if((int)(first_indexes[s-start_sequence] % 8 - ref_off % 8) < 0)
        {
            int shamt2 = (ref_off % 8 - first_indexes[s-start_sequence] % 8);
            mask_buffer = *((unsigned long *)(msk_base + (ref_off - first_indexes[s-start_sequence])/8)) >> shamt2;

            if( ( (*(unsigned long *)(seqmaskc + 16 * (s-start_sequence))) ^ mask_buffer ) << shamt2) 
                continue;
        }

        else if((int)(first_indexes[s-start_sequence] % 8 - ref_off % 8) == 0)
        {
            mask_buffer = *((unsigned long *)(msk_base + (ref_off)/8));

            if( (*(unsigned long *)(seqmaskc + 16 * (s-start_sequence)) ^ mask_buffer))
                continue;
        }

        else
        {
            int shamt2 = 8 - (first_indexes[s-start_sequence] % 8 - ref_off % 8);
            mask_buffer = *((unsigned long *)(msk_base + (ref_off/8- first_indexes[s-start_sequence]/8) - 1)) >> shamt2;

            if( ( (*(unsigned long *)(seqmaskc + 16 * (s-start_sequence))) ^ mask_buffer ) << shamt2) 
                continue;
        }

        int shamt = (ref_off % 4 - first_indexes[s-start_sequence] % 4) * 2;

        memcpy(reference_blk, ref_base + ref_off / 4 - first_indexes[s-start_sequence] / 4, sequence_bytes);

        for (rp = last_rp ; rp != (unsigned long *) reference_blk ; rp--) 
        {
            unsigned long tmp = ((*rp) & ((1 << shamt) - 1)) << (8 * sizeof(unsigned long) - shamt);
            *rp = (*rp >> shamt) | shifted_in;
            shifted_in = tmp;
        }

        *rp = (*rp >> shamt) | shifted_in;

        if (sequence_length & 0x3)
            reference_blk[sequence_length >> 2] &= (1 << ((sequence_length & 0x3) << 1)) - 1;

        for ( i = sequence_length >> 2 ; i & (SEQUENCE_ALIGN - 1) ; i++ )
            reference_blk[i] = 0;

        //-- instead of memcmp --//
        int v = 0;
        char *p1 = (char *)sequence;
        char *p2 = (char *)reference_blk;
        int tmp_asd = sequence_bytes;

        while(tmp_asd!=0)
        {
            v = *(p1++) - *(p2++);

            if(v!=0)
                break;

            tmp_asd--;
        }

        if(v == 0)
        {
            mat_count[s - (int)start_sequence]++;      /* Maintain count */
            mat_position[s - (int)start_sequence] = ref_off-first_indexes[s-start_sequence]; /* Record latest position */
        }

    }
}

This for loop is main function of my code. But the problem is that variable "p" is never increased over 5 or 6. I have GT530 in my computer and my CUDA Driver Version and Runtime Version are also 4.0. What is problem in this code???

Answer 1

Assuming this is indeed a kernel, which your original post before your new code edit claimed it was, you can't call standard C-library functions like memcpy inside of a CUDA kernel (you do that about half-way through the code). You can only call __device__ functions in a kernel. So unless that is a re-implementation of memcpy in CUDA, calling that function inside your kernel is not going to work ...

Also, if you are going to attempt to write CUDA kernels that need to check the results of what other threads in the group have written, such as you're doing in the memory comparison section of your kernel, you will need to place some synchronization points in your code in order to ensure that all the threads have reached a certain point before you start checking the results from those threads. If you are attempting to check what another block may have written to main memory, there are also synchronization primitives for that as well that ensure that the results of any previous thread block have been written out to main memory.

CUDA doesn't work as expected?

Question

1 answers

solution1
1 2012-03-28 04:07:29

CUDA doesn't work as expected?

Question

1 answers

solution1 1 2012-03-28 04:07:29

solution1
1 2012-03-28 04:07:29