简体   繁体   中英

Loading large matrix with Armadillo

I have a very sparse matrix, with a density of about 0.01 , and dimensions 20000 x 500000 . I'm trying to load this in armadillo with

sp_mat V;
V.load(filename, coord_ascii);

The file format is

row column value

But this is taking way too long. Python can parse the file and fill a dictionary with it way faster than armadillo can create this matrix. How should I properly do this?

The matrix is going to be filled with integers.

Any advice would be appreciated!

Update:

This is an issue solely with Armadillo. C++ iterates the file without issue when read line by line, but assigning the values into an arma::sp_mat is extremely slow.

The armadillo documentation specifies

"Using batch insertion constructors is generally much faster than consecutively inserting values using element access operators"

So here is the best I could come up with

sp_mat get(const char *filename) {         
    vector<long long unsigned int> location_u;
    vector<long long unsigned int> location_m;
    vector<double> values;                    

    ifstream file(filename);                  
    int a, b, c;                              
    while(file >> a >> b >> c) {                                   
        location_u.push_back(a);              
        location_m.push_back(b);              
        values.push_back(c);                  
    }                                         

    umat lu(location_u);                      
    umat lm(location_m);                      
    umat location(join_rows(lu, lm).t());     

    return V(location, vec(values));                                         
}                                             

It now runs at a reasonable speed, at about 1 million lines a second.

I today encountered this very same problem when trying to load 100MB CSV using Armadillo's .load() . It's just too slow.

Since @Enrico Borba answered that he is doing his own file reading using std::ifstream and the result is pretty amazing, here is my own code to load a CSV file to the Armadillo's mat type using ifstream too.

For example, if you're trying to do this, it will take soo much time to load the file:

arma::mat A;
A.load("file.csv", arma::csv_ascii);

So this is an alternative, which is thousand more faster than above code:

arma::mat readCSV(const std::string &filename, const std::string &delimeter = ",")
{
    std::ifstream csv(filename);
    std::vector<std::vector<double>> datas;

    for(std::string line; std::getline(csv, line); ) {

        std::vector<double> data;

        // split string by delimeter
        auto start = 0U;
        auto end = line.find(delimeter);
        while (end != std::string::npos) {
            data.push_back(std::stod(line.substr(start, end - start)));
            start = end + delimeter.length();
            end = line.find(delimeter, start);
        }
        data.push_back(std::stod(line.substr(start, end)));
        datas.push_back(data);
    }

    arma::mat data_mat = arma::zeros<arma::mat>(datas.size(), datas[0].size());

    for (int i=0; i<datas.size(); i++) {
        arma::mat r(datas[i]);
        data_mat.row(i) = r.t();
    }

    return data_mat;
}

Then you can substitute it like below:

arma::mat A = readCSV("file.csv");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM