简体   繁体   中英

Streaming over 1m records from SQL Server to MongoDB using Node.js

I'm trying to copy 8,000,000 rows of data from Microsoft SQL Sever into MongoDB. It works great for 100,000 records, but when I try to pull 1,000,000 records (or all), I get the following error:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

Here's the code (Coffeescript) I'm currently using:

MsSqlClient   = require 'mssql'
MongoClient = require('mongodb').MongoClient

config = {}
config.mongodb = 'mongodb://localhost:27017/dbname'
config.mssql = 'mssql://user::pass@host/dbname'

Promise.all(
  [
    MongoClient.connect config.mongodb
    MsSqlClient.connect config.mssql
  ]
).then (a) ->
  mongo = a[0]
  sql = a[1]

  collection = mongo.collection "collection_name"

  request = new MsSqlClient.Request()
  request.stream = true

  request.on 'row', (row) ->
    collection.insert(row)

  request.on 'done', (affected) ->
    console.log "Completed"

  sql.on 'error', (err) ->
    console.log err

  console.log "Querying"
  request.query("SELECT * FROM big_table")

.catch (err) ->
  console.log "ERROR: ", err

It seems that the write to MongoDB is taking way longer than the download from SQL Server which I believe to be causing a bottleneck. Is there a way to slow down (pause/resume) the stream from SQL Server so I can pull and write in chunks without adding an index column in the SQL data and selecting by row number?

Running:

  • Windows 7, SQL Server 2012 (SP1), MongoDB 2.8.0
  • Node.js 4.2.4 / mssql 3.3.0 / mongodb 2.1.19

You could do it in blocks (50'000 for example). Here a way (SQL side only) how you can do it (not super fast but should work):

Get blocks first, these number you have to loop outside of SQL:

    -- get blocks

    select count(*) / 50000 as NumberOfBlocksToLoop
    from YOUR.TABLE

Get the block, where ColumnU is a column that allows you to sort your table (alternatively, you could use an ID directly, but then you might have the problem of gaps if data are being deleted from the table):

    -- get first n-block

    declare @BlockNumber int

    set @BlockNumber = 1

    select ColumnX
    from
    (
        select row_number() over (order by ColumnU asc) as RowNumber,
        TABLE.ColumnX
        from YOUR.TABLE
    ) Data
    where RowNumber between ((@BlockNumber - 1) * 50000) + 1 and @BlockNumber * 50000

Try to find a good size for your block (of course depends on your system) to avoid running into out of memory exception again. You should catch the exception, and then, based on your effort, either delete the already transferred data or calculate a smaller block size (a bit more difficult) and continue with the rest to transfer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM