使稀疏矩阵快速相乘

Question

The code is written using C++11. 该代码使用C ++ 11编写。 Each Process got tow Matrix Data(Sparse). 每个进程都有两个矩阵数据（稀疏）。 The test data can be downloaded from enter link description here 可以从此处的输入链接描述中下载测试数据

Test data contains 2 file : a0 (Sparse Matrix 0) and a1 (Sparse Matrix 1). 测试数据包含2个文件：a0（稀疏矩阵0）和a1（稀疏矩阵1）。 Each line in file is "ijv", means the sparse matrix Row i, Column j has the value v. i,j,v are all integers. 文件中的每一行都是“ ijv”，表示稀疏矩阵行i，列j的值为v。i，j，v都是整数。

Use c++11 unordered_map as the sparse matrix's data structure. 使用c ++ 11 unordered_map作为稀疏矩阵的数据结构。

unordered_map<int, unordered_map<int, double> > matrix1 ;
matrix1[i][j] = v ; //means at row i column j of matrix1 is value v;

The following code took about 2 minutes. 以下代码耗时约2分钟。 The compile command is g++ -O2 -std=c++11 ./matmult.cpp . 编译命令是g++ -O2 -std=c++11 ./matmult.cpp 。

g++ version is 4.8.1, Opensuse 13.1. g ++版本是4.8.1，Opensuse 13.1。 My computer's info : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz, 4G memory. 我的电脑的信息：英特尔®酷睿™i5-4200U CPU @ 1.60GHz，4G内存。

#include <iostream>
#include <fstream>
#include <unordered_map>
#include <vector>
#include <thread>

using namespace std;

void load(string fn, unordered_map<int,unordered_map<int, double> > &m) {
  ifstream input ;
  input.open(fn);
  int i, j ; double v;
  while (input >> i >> j >> v)  {
    m[i][j] = v;
  }
}

unordered_map<int,unordered_map<int, double> > m1;
unordered_map<int,unordered_map<int, double> > m2;
//vector<vector<int> > keys(BLK_SIZE);

int main() {
  load("./a0",m1);
  load("./a1",m2);

  for (auto r1 : m1) {
    for (auto r2 : m2) {
      double sim = 0.0 ;
      for (auto c1 : r1.second) {
        auto f = r2.second.find(c1.first);
        if (f != r2.second.end()) {
           sim += (f->second) * (c1.second) ;
        }
      }
   }
  }
  return 0;
}

The code above is too slow. 上面的代码太慢了。 How can I make it run faster? 如何使其运行更快？ I use multithread. 我使用多线程。 The new code is following, compile command is g++ -O2 -std=c++11 -pthread ./test.cpp . 新代码如下，编译命令为g++ -O2 -std=c++11 -pthread ./test.cpp 。 And it took about 1 minute. 大约花了1分钟。 I want it to be faster. 我希望它更快。

How Can I make the task faster? 如何使任务更快？ Thank you! 谢谢！

#include <iostream>
#include <fstream>
#include <unordered_map>
#include <vector>
#include <thread>

#define BLK_SIZE 8

using namespace std;

void load(string fn, unordered_map<int,unordered_map<int, double> > &m) {
  ifstream input ;
  input.open(fn);
  int i, j ; double v;
  while (input >> i >> j >> v)  {
    m[i][j] = v;
  }
}

unordered_map<int,unordered_map<int, double> > m1;
unordered_map<int,unordered_map<int, double> > m2;
vector<vector<int> > keys(BLK_SIZE);

void thread_sim(int blk_id) {
  for (auto row1_id : keys[blk_id]) {
    auto r1 = m1[row1_id];
    for (auto r2p : m2) {
      double sim = 0.0;
      for (auto col1 : r1) {
        auto f = r2p.second.find(col1.first);
        if (f != r2p.second.end()) {
          sim += (f->second) * col1.second ;
        }
      }
    }
  }
}

int main() {

  load("./a0",m1);
  load("./a1",m2);

  int df = BLK_SIZE - (m1.size() % BLK_SIZE);
  int blk_rows = (m1.size() + df) / (BLK_SIZE - 1);
  int curr_thread_id  = 0;
  int index = 0;
  for (auto k : m1) {
    keys[curr_thread_id].push_back(k.first);
    index++;
    if (index==blk_rows) {
      index = 0;
      curr_thread_id++;
    }
  }
  cout << "ok" << endl;
  std::thread t[BLK_SIZE];
  for (int i = 0 ; i < BLK_SIZE ; ++i){
    t[i] = std::thread(thread_sim,i);
  }
  for (int i = 0; i< BLK_SIZE; ++i)
    t[i].join();

  return 0 ;
}

Answer 1

Most times when working with sparse matrices one uses more efficient representations than the nested maps you have. 在大多数情况下，使用稀疏矩阵会比使用嵌套映射使用更有效的表示形式。 Typical choices are Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC). 典型选择是压缩稀疏行（CSR）或压缩稀疏列（CSC）。 See https://en.wikipedia.org/wiki/Sparse_matrix for details. 有关详细信息，请参见https://en.wikipedia.org/wiki/Sparse_matrix 。

Answer 2

You haven't specified the time you expect your example to run in or the platform you hope to run on. 您尚未指定示例运行的时间或希望运行平台的时间。 These are important design contraints in this example. 这些是此示例中的重要设计约束。

There are several areas that I can think of for improving the efficeny of this:- 我可以考虑改善以下几个方面的效率：-

Improve the way the data is stored 改善数据存储方式
Improve the multithreading 改善多线程
Improve the algorithm 改进算法

The first point is geared toward the way the system stores the sparse arrays and the interfaces to enable the data to be read. 第一点针对系统存储稀疏数组和接口以使数据能够被读取的方式。 Nested unordered_maps are a good option when speed isn't important but there may be more specific data structures available that are geared toward this problem. 如果速度并不重要，但是可以使用更具体的数据结构来解决此问题，则嵌套的unordered_maps是一个不错的选择。 At best you may find a library that provides a better way to store the data than nested maps, at worst you may have to come up with something yourself. 最好的情况是，您可能会找到一个比嵌套地图提供更好的数据存储方式的库，最坏的情况下，您可能必须自己准备一些东西。

The second point refers to the way the multithreading is supported in the language. 第二点涉及该语言支持多线程的方式。 The original spec for the multithreading system were meant to be platform independant and might miss out handy features some systems might have. 多线程系统的原始规范旨在独立于平台，并且可能会错过某些系统可能具有的便捷功能。 Decide what system you want to target and use the OSs threading system. 确定要定位的系统并使用OS线程系统。 You'll have more control over the way the threading works, possibly reduce the overhead but will lose out on the cross platform support. 您将对线程的工作方式有更多控制，可能会减少开销，但会失去跨平台支持。

The third point will take a bit of work. 第三点需要一些工作。 Is the way you're multiplying the matricies really the most efficent way given the nature of the data. 在给定数据性质的情况下，乘矩阵的方法确实是最有效的方法。 I'm no expert on these things but it is something to consider but it will take a bit of effort. 我不是这些方面的专家，但是可以考虑，但是需要一些努力。

Lastly, you can always be very specific about the platform you're running on and head into the world of assembly programming. 最后，您始终可以非常清楚自己所运行的平台，并进入汇编程序设计领域。 Modern CPUs are complicated beasts. 现代CPU是复杂的野兽。 They can sometimes perform operations in parallel. 他们有时可以并行执行操作。 For example, you may be able to do SIMD operations or do parallel integer and floating point operations. 例如，您可以执行SIMD运算或并行整数和浮点运算。 Doing this does require a deep understanding of what's going on and there are useful tools to help you out. 这样做确实需要对正在发生的事情有深刻的了解，并且有一些有用的工具可以帮助您。 Intel did have a tool called VTune (it may be something else now) that would analyse code and highlight potential bottlenecks. 英特尔确实有一个称为VTune的工具（现在可能是其他名称），该工具可以分析代码并突出显示潜在的瓶颈。 Ultimately, you'll be wanting to eliminate areas of the algorithm where the CPU is idle waiting for something to happen (like waiting for data from RAM) either by finding something else for the CPU to do or improving the algorithm (or both). 最终，您将想要消除CPU闲置等待某些事情发生的算法区域（例如，等待来自RAM的数据），方法是寻找其他可以让CPU进行的事情或改进算法（或两者）。

Ultimately, in order to improve the overall speed, you'll need to know what is slowing it down. 最终，为了提高整体速度，您需要了解导致速度下降的原因。 This generally means knowing how to analyse your code and understand the results. 这通常意味着知道如何分析代码并了解结果。 Profilers are the general tool for this but there are platform specific tools available as well. 探查器是用于此的通用工具，但也有特定于平台的工具。

I know this isn't quite what you want but making code fast is really hard and very time consuming. 我知道这不是您想要的，但是快速编写代码确实非常困难且非常耗时。

使稀疏矩阵快速相乘

问题描述

2 个解决方案

解决方案1
0 2015-02-07 10:54:17

解决方案2
0 2015-02-07 11:17:50

使稀疏矩阵快速相乘

问题描述

2 个解决方案

解决方案1 0 2015-02-07 10:54:17

解决方案2 0 2015-02-07 11:17:50

解决方案1
0 2015-02-07 10:54:17

解决方案2
0 2015-02-07 11:17:50