简体   繁体   中英

Why does loading data into a Meteor Collection take so long?

I'm trying to build a data visualization application using Meteor to visualize a large dataset. The data is currently stored in a CSV-format data file, and is about 64MB.

I'm using the node-csv plugin to load this data file into a Meteor Collection (code below). But it's taking about 1 minute per 10k records, which at that rate will take about 1.5 hours to load the whole file into the Collection. During that time, the Meteor server is unresponsive to web requests.

This seems abnormally slow to me. Is this normal? Is Meteor just not designed to handle moderately large amounts of data? Or is there a better way to do this data-import process than the way I discovered?

var csv = Meteor.require('CSV');
var fs = Meteor.require('fs');
var path = Npm.require('path');

function loadData() {
  var basepath = path.resolve('.').split('.meteor')[0];
  console.log('Loading data into Meteor...');

  csv().from.stream(
    fs.createReadStream(basepath+'server/data/enron_data.csv'),
      {'escape': '\\'})
    .on('record', Meteor.bindEnvironment(function(row, index) {
      if ((index % 10000) == 0) {
        console.log('Processing:', index, row);
      }
      Emails.insert({
        'sender_id': row[0],
        'recipient_id': row[1],
        'recipient_type': row[2],
        'date': row[3],
        'timezone': row[4],
        'subject': row[5]
        })
      }, function(error) {
          console.log('Error in bindEnvironment:', error);
      }
    ))
    .on('error', function(err) {
      console.log('Error reading CSV:', err);
    })
    .on('end', function(count) {
      console.log(count, 'records read');
    });
}

Even if you do this outside of the meteor environment, loading your data one row at a time is really inefficient. I think the tool you want is mongoimport .

It may not be obvious, but you do not need to insert your documents with meteor in order to use meteor with your documents.

You can try calling mongoimport from Meteor.startup when there are 0 documents in your collection (or whatever base condition make sense in your situation). I haven't tried this so I can't say how much of a pain this is, but I'd imagine you could just call child_process.spawn to start mongoimport. If for some reason that doesn't work you could always put it in a script and run that script whenever you do a meteor reset .

Side note - I believe the appropriate place for your static server assets is the private directory. This also lets you use the Assets api to access those files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM