简体   繁体   中英

Why is the kiss_fft's forward and inverse radix-4 calculation different, part 2?

Part 1 - why the code below checks st_inverse in the first place

The kiss_fft code has this branch inside a loop :

do {
    if(st->inverse) {
        Fout[m].r = scratch[5].r - scratch[4].i;
        Fout[m].i = scratch[5].i + scratch[4].r;
        Fout[m3].r = scratch[5].r + scratch[4].i;
        Fout[m3].i = scratch[5].i - scratch[4].r;
    }else{
        Fout[m].r = scratch[5].r + scratch[4].i;
        Fout[m].i = scratch[5].i - scratch[4].r;
        Fout[m3].r = scratch[5].r - scratch[4].i;
        Fout[m3].i = scratch[5].i + scratch[4].r;
    }
    ++Fout;
} while (--k); // Fout[] has k*4 elements.

Slightly reordered:

if(st->inverse) {
    Fout[m].r = scratch[5].r - scratch[4].i;
    Fout[m].i = scratch[5].i + scratch[4].r;
    Fout[m3].r = scratch[5].r + scratch[4].i;
    Fout[m3].i = scratch[5].i - scratch[4].r;
}else{
    Fout[m3].r = scratch[5].r - scratch[4].i;
    Fout[m3].i = scratch[5].i + scratch[4].r
    Fout[m].r = scratch[5].r + scratch[4].i;
    Fout[m].i = scratch[5].i - scratch[4].r;;
}

The two code blocks really differ only in their use of m and m3 . But m and m3 are not changed inside the loop. Can I simply eliminate this inner-loop branch by swapping m and m3 ?

if(st->inverse) { swap(&m, &m3); }
do {
    Fout[m].r = scratch[5].r - scratch[4].i;
    Fout[m].i = scratch[5].i + scratch[4].r;
    Fout[m3].r = scratch[5].r + scratch[4].i;
    Fout[m3].i = scratch[5].i - scratch[4].r;
   ++Fout;
} while (--k);

I can indeed use that optimization. It's not necessary however with current-gen compilers that can use AVX. They'll eliminate that branch as well, using vpcmpeqd and vblendvps .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM