为什么将数据加载到Meteor Collection需要这么长时间？

Question

I'm trying to build a data visualization application using Meteor to visualize a large dataset. 我正在尝试使用Meteor构建一个数据可视化应用程序来可视化大型数据集。 The data is currently stored in a CSV-format data file, and is about 64MB. 数据当前存储在CSV格式的数据文件中，大约为64MB。

I'm using the node-csv plugin to load this data file into a Meteor Collection (code below). 我正在使用node-csv插件将此数据文件加载到Meteor Collection（下面的代码）中。 But it's taking about 1 minute per 10k records, which at that rate will take about 1.5 hours to load the whole file into the Collection. 但是每10k记录大约需要1分钟，按照这个速度大约需要1.5小时将整个文件加载到Collection中。 During that time, the Meteor server is unresponsive to web requests. 在此期间，Meteor服务器对Web请求没有响应。

This seems abnormally slow to me. 这对我来说似乎异常缓慢。 Is this normal? 这是正常的吗？ Is Meteor just not designed to handle moderately large amounts of data? Meteor是不是设计用于处理适度大量的数据？ Or is there a better way to do this data-import process than the way I discovered? 或者，有没有比我发现的方式更好的方法来执行此数据导入过程？

var csv = Meteor.require('CSV');
var fs = Meteor.require('fs');
var path = Npm.require('path');

function loadData() {
  var basepath = path.resolve('.').split('.meteor')[0];
  console.log('Loading data into Meteor...');

  csv().from.stream(
    fs.createReadStream(basepath+'server/data/enron_data.csv'),
      {'escape': '\\'})
    .on('record', Meteor.bindEnvironment(function(row, index) {
      if ((index % 10000) == 0) {
        console.log('Processing:', index, row);
      }
      Emails.insert({
        'sender_id': row[0],
        'recipient_id': row[1],
        'recipient_type': row[2],
        'date': row[3],
        'timezone': row[4],
        'subject': row[5]
        })
      }, function(error) {
          console.log('Error in bindEnvironment:', error);
      }
    ))
    .on('error', function(err) {
      console.log('Error reading CSV:', err);
    })
    .on('end', function(count) {
      console.log(count, 'records read');
    });
}

Answer 1

Even if you do this outside of the meteor environment, loading your data one row at a time is really inefficient. 即使您在流星环境之外执行此操作，一次加载一行数据也是非常低效的。 I think the tool you want is mongoimport . 我认为你想要的工具是mongoimport 。

It may not be obvious, but you do not need to insert your documents with meteor in order to use meteor with your documents. 这可能不是很明显，但您不需要使用流星插入文档，以便将meteor与您的文档一起使用。

You can try calling mongoimport from Meteor.startup when there are 0 documents in your collection (or whatever base condition make sense in your situation). 当您的集合中有0个文档时（或者在您的情况下有任何基本条件有意义），您可以尝试从Meteor.startup调用mongoimport。 I haven't tried this so I can't say how much of a pain this is, but I'd imagine you could just call child_process.spawn to start mongoimport. 我没有试过这个，所以我不能说这是多么痛苦，但我想你可以调用child_process.spawn来启动mongoimport。 If for some reason that doesn't work you could always put it in a script and run that script whenever you do a meteor reset . 如果由于某些原因不起作用，您可以随时将其放入脚本中并在执行meteor reset时运行该脚本。

Side note - I believe the appropriate place for your static server assets is the private directory. 附注 - 我认为静态服务器资产的适当位置是private目录。 This also lets you use the Assets api to access those files. 这也允许您使用Assets api访问这些文件。

为什么将数据加载到Meteor Collection需要这么长时间？

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-10-23 20:21:30

为什么将数据加载到Meteor Collection需要这么长时间？

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-10-23 20:21:30

解决方案1
4 已采纳 2013-10-23 20:21:30