C内联汇编中的PCLMULQDQ指令

Question

I want to use Intel's PCLMULQDQ instruction with inline assembly in my C Code for multiplying two polynomials, which are elements in GF(2^n). 我想在我的C代码中将Intel的PCLMULQDQ指令与内联汇编一起使用，以将两个多项式相乘，它们是GF（2 ^ n）中的元素。 Compiler is GCC 4.8.1. 编译器为GCC 4.8.1。 The polynomials are stored in arrays of uint32_t (6 fields big). 多项式存储在uint32_t（6个字段大）的数组中。

I already checked the web how to use the PCLMULQDQ instruction or CLMUL instruction set properly, but didn't found any good documentation. 我已经在网上检查了如何正确使用PCLMULQDQ指令或CLMUL指令集，但是没有找到任何好的文档。

I would really appreciate a simple example in C and asm of how to multiply two simple polynomials with the instruction. 我真的很感激C和asm中的一个简单示例，该示例如何将两个简单多项式与指令相乘。 Does anybody know how to do it? 有人知道怎么做吗？

Besides are there any prerequisites (except a capable processor), like included libraries, compiler options etc.? 除此以外，还有其他先决条件（功能强大的处理器除外），例如随附的库，编译器选项等吗？

Answer 1

I already found a solution. 我已经找到了解决方案。 Thus for the record: 因此作记录：

void f2m_intel_mult(
  uint32_t t, // length of arrays A and B
  uint32_t *A,
  uint32_t *B,
  uint32_t *C
)
{
    memset(C, 0, 2*t*sizeof(uint32_t));
    uint32_t offset = 0;
    union{ uint64_t val; struct{uint32_t low; uint32_t high;} halfs;} prod;

    uint32_t i;
    uint32_t j;
    for(i=0; i<t; i++){
        for(j=0; j<t; j++){

            prod.halfs.low = A[i];
            prod.halfs.high = 0;
            asm ("pclmulqdq %2, %1, %0;"
            : "+x"(prod.val)
            : "x"(B[j]), "i"(offset)
            );

            C[i+j] = C[i+j] ^ prod.halfs.low;
            C[i+j+1] = C[i+j+1] ^ prod.halfs.high;
        }
    }
}

I think it is possible to use 64bit registers for pclmulqdq, but I couldn't find out how to get this working with inline assembler. 我认为有可能对pclmulqdq使用64位寄存器，但是我找不到如何使它与内联汇编器一起使用的方法。 Does anybody know this? 有人知道吗？
Nevertheless it is also possible to do the same with intrinsics. 不过，也可以对内在函数做同样的事情。 (If you want the code just ask.) （如果您需要代码，请询问。）
Besides it is possible to optimize the calculation further with Karatsuba, if you know the size t of the arrays. 此外，如果您知道数组的大小t，则可以使用Karatsuba进一步优化计算。

C内联汇编中的PCLMULQDQ指令

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-01-28 19:14:58

C内联汇编中的PCLMULQDQ指令

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-01-28 19:14:58

解决方案1
0 已采纳 2014-01-28 19:14:58