简体   繁体   English

二进制矩阵向量乘法

[英]Binary matrix vector multiplication

I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char.我想将表示为无符号 64 位 integer 的 8x8二进制矩阵乘以由无符号字符表示的 8 位向量。 However, due to some other issues the matrix must be ordered by columns, ergo there's no easy matching of bytes for easy multiplication.但是,由于其他一些问题,矩阵必须按列排序,因此没有简单的字节匹配以便于乘法。

Any idea how to speed up such a calculation?知道如何加快这样的计算吗? Every operation counts for I need billions of such calculations made.每个操作都很重要,因为我需要进行数十亿次这样的计算。

The multiplications are made over a 2 element field (F-2).乘法是在一个 2 元素字段 (F-2) 上进行的。

With this matrix and vector representation, it helps to do matrix multiplication this way:使用这种矩阵和向量表示,它有助于以这种方式进行矩阵乘法:

(col 1 ... col 8 ) * (v 1 ... v 8 ) T = col 1 * v 1 +... + col 8 * v 8 (col 1 ... col 8 ) * (v 1 ... v 8 ) T = col 1 * v 1 +... + col 8 * v 8

where matrix A = (col 1 ... col 8 )其中矩阵 A = (col 1 ... col 8 )

and column vector v = (v 1 ... v 8 ) T和列向量 v = (v 1 ... v 8 ) T

Thinking this further, you can do all multiplications at once if you inflate the 8-bit vector to a 64-bit vector by repeating every bit 8 times and then calculating P = A & v_inflated .进一步考虑,如果通过将每个位重复 8 次然后计算P = A & v_inflated将 8 位向量膨胀为 64 位向量,则可以一次进行所有乘法运算。 The only thing left then, is the addition (ie XOR) of the products.剩下的唯一事情就是产品的加法(即XOR)。

A simple approach for XORing the products is.对产品进行异或的一种简单方法是。

uint64_t P = calculated products from text above;
uint64_t sum = 0;
for( int i = 8; i; --i )
{
   sum ^= P & 0xFF;
   P >> 8;  
}

You ONLY HAVE 256 vectors, Use lookup tables to generate the right bitmasks, then your logic will be something like您只有 256 个向量,使用查找表生成正确的位掩码,那么您的逻辑将类似于

output_bit_n = bool (matrix [n] & lookup [vector])

In other words, your lookup table can transpose an 8-bit value into the 64-bit world.换句话说,您的查找表可以将 8 位值转换为 64 位世界。

You can efficiently pack this into the result with rotate-with-carry instructions if the compiler isn't smart enough to optimise (value<<=1)|=result .如果编译器不够聪明,无法优化(value<<=1)|=result ,您可以使用带有旋转进位的指令有效地将其打包到结果中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM