[英]Binary matrix vector multiplication
I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char.我想将表示为无符号 64 位 integer 的 8x8二进制矩阵乘以由无符号字符表示的 8 位向量。 However, due to some other issues the matrix must be ordered by columns, ergo there's no easy matching of bytes for easy multiplication.但是,由于其他一些问题,矩阵必须按列排序,因此没有简单的字节匹配以便于乘法。
Any idea how to speed up such a calculation?知道如何加快这样的计算吗? Every operation counts for I need billions of such calculations made.每个操作都很重要,因为我需要进行数十亿次这样的计算。
The multiplications are made over a 2 element field (F-2).乘法是在一个 2 元素字段 (F-2) 上进行的。
With this matrix and vector representation, it helps to do matrix multiplication this way:使用这种矩阵和向量表示,它有助于以这种方式进行矩阵乘法:
(col 1 ... col 8 ) * (v 1 ... v 8 ) T = col 1 * v 1 +... + col 8 * v 8 (col 1 ... col 8 ) * (v 1 ... v 8 ) T = col 1 * v 1 +... + col 8 * v 8
where matrix A = (col 1 ... col 8 )其中矩阵 A = (col 1 ... col 8 )
and column vector v = (v 1 ... v 8 ) T和列向量 v = (v 1 ... v 8 ) T
Thinking this further, you can do all multiplications at once if you inflate the 8-bit vector to a 64-bit vector by repeating every bit 8 times and then calculating P = A & v_inflated
.进一步考虑,如果通过将每个位重复 8 次然后计算P = A & v_inflated
将 8 位向量膨胀为 64 位向量,则可以一次进行所有乘法运算。 The only thing left then, is the addition (ie XOR) of the products.剩下的唯一事情就是产品的加法(即XOR)。
A simple approach for XORing the products is.对产品进行异或的一种简单方法是。
uint64_t P = calculated products from text above;
uint64_t sum = 0;
for( int i = 8; i; --i )
{
sum ^= P & 0xFF;
P >> 8;
}
You ONLY HAVE 256 vectors, Use lookup tables to generate the right bitmasks, then your logic will be something like您只有 256 个向量,使用查找表生成正确的位掩码,那么您的逻辑将类似于
output_bit_n = bool (matrix [n] & lookup [vector])
In other words, your lookup table can transpose an 8-bit value into the 64-bit world.换句话说,您的查找表可以将 8 位值转换为 64 位世界。
You can efficiently pack this into the result with rotate-with-carry instructions if the compiler isn't smart enough to optimise (value<<=1)|=result
.如果编译器不够聪明,无法优化(value<<=1)|=result
,您可以使用带有旋转进位的指令有效地将其打包到结果中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.