I'm trying to copy 8,000,000 rows of data from Microsoft SQL Sever into MongoDB. It works great for 100,000 records, but when I try to pull 1,000,000 records (or all), I get the following error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Here's the code (Coffeescript) I'm currently using:
MsSqlClient = require 'mssql'
MongoClient = require('mongodb').MongoClient
config = {}
config.mongodb = 'mongodb://localhost:27017/dbname'
config.mssql = 'mssql://user::pass@host/dbname'
Promise.all(
[
MongoClient.connect config.mongodb
MsSqlClient.connect config.mssql
]
).then (a) ->
mongo = a[0]
sql = a[1]
collection = mongo.collection "collection_name"
request = new MsSqlClient.Request()
request.stream = true
request.on 'row', (row) ->
collection.insert(row)
request.on 'done', (affected) ->
console.log "Completed"
sql.on 'error', (err) ->
console.log err
console.log "Querying"
request.query("SELECT * FROM big_table")
.catch (err) ->
console.log "ERROR: ", err
It seems that the write to MongoDB is taking way longer than the download from SQL Server which I believe to be causing a bottleneck. Is there a way to slow down (pause/resume) the stream from SQL Server so I can pull and write in chunks without adding an index column in the SQL data and selecting by row number?
Running:
You could do it in blocks (50'000 for example). Here a way (SQL side only) how you can do it (not super fast but should work):
Get blocks first, these number you have to loop outside of SQL:
-- get blocks
select count(*) / 50000 as NumberOfBlocksToLoop
from YOUR.TABLE
Get the block, where ColumnU is a column that allows you to sort your table (alternatively, you could use an ID directly, but then you might have the problem of gaps if data are being deleted from the table):
-- get first n-block
declare @BlockNumber int
set @BlockNumber = 1
select ColumnX
from
(
select row_number() over (order by ColumnU asc) as RowNumber,
TABLE.ColumnX
from YOUR.TABLE
) Data
where RowNumber between ((@BlockNumber - 1) * 50000) + 1 and @BlockNumber * 50000
Try to find a good size for your block (of course depends on your system) to avoid running into out of memory exception again. You should catch the exception, and then, based on your effort, either delete the already transferred data or calculate a smaller block size (a bit more difficult) and continue with the rest to transfer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.