使用SSE內在函數注冊缺少

Question

在這篇SSE加載/存儲內存事務中，我詢問了顯式寄存器內存事務和中間指針之間的區別。 在實踐中，中間指針顯示略高的性能，但是，不清楚什么是硬件方面的中間指針？ 如果創建了指針，是否意味着某些寄存器也被占用，或者在某些SSE操作期間發生了寄存器調用（例如_mm_mul）？

讓我們考慮一下這個例子：

struct sse_simple
{
    sse_simple(unsigned int InputLength):
        Len(InputLength/4),
        input1((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))),
        input2((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))),
        output((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))),
        inp1_sse(reinterpret_cast<__m128*>(input1)),
        inp1_sse(reinterpret_cast<__m128*>(input2)),
        output_sse(reinterpret_cast<__m128*>(output))
    {}

    ~sse_simple()
    {
        _mm_free(input1);
        _mm_free(input2);
        _mm_free(output);
    }

    void func()
    {
        for(auto i=0; i<Len; ++i)
            output_sse[i] = _mm_mul(inp1_sse[i], inp2_sse[i]);
    }

    float *input1;
    float *input2;
    float *output; 

    __m128 *inp1_sse;
    __m128 *inp2_sse;
    __m128 *output_sse;

    unsigned int Len;
};

在上面的示例中，中間指針inp1_sse，inp2_sse和output_sse在構造函數中創建一次。 如果我復制了大量的sse_simple對象（例如50 000或更多），這是否會導致寄存器短缺？

Answer 1

首先，寄存器是與計算單元接近（意味着訪問非常快）的小存儲器。 編譯器盡可能地嘗試使用它們來加速計算，但是當它不能使用內存時。 由於存儲在寄存器中的存儲量很小，通常寄存器僅在計算期間用作臨時存儲器。 大多數情況下，一切都最終存儲在內存中，除了臨時變量，如循環索引......因此，寄存器的不足只會減慢計算速度。

在計算過程中，指針存儲在通用寄存器（GPR）中，無論它們指向浮點數，向量還是其他，而向量__m128存儲在特定寄存器中。

因此，在您的示例中，樹陣列將存儲在內存和行中

output_sse[i] = _mm_mul(inp1_sse[i], inp2_sse[i]);

編譯為：

movaps -0x30(%rbp),%xmm0    # load inp1_sse[i] in register %xmm0
movaps -0x20(%rbp),%xmm1    # load inp2_sse[i] in register %xmm1
mulps  %xmm1,%xmm0          # perform the multiplication the result is stored in %xmm0
movaps %xmm0,(%rdx)         # store the result in memory

如您所見，使用寄存器%rbp和%rdx存儲指針。

使用SSE內在函數注冊缺少

問題描述

1 個解決方案

解決方案1
2 已采納 2013-07-17 10:55:59

使用SSE內在函數注冊缺少

問題描述

1 個解決方案

解決方案1 2 已采納 2013-07-17 10:55:59

解決方案1
2 已采納 2013-07-17 10:55:59