简体   繁体   中英

How to bulkCreate millions of levels of data?

I try to call Model.bulkCreate method to insert four million data into the items table. But got error below:

Executing (default): DROP TABLE IF EXISTS "items" CASCADE;
Executing (default): DROP TABLE IF EXISTS "items" CASCADE;
Executing (default): CREATE TABLE IF NOT EXISTS "items" ("id"   SERIAL , "tenant_id" UUID NOT NULL, PRIMARY KEY ("id"));
Executing (default): SELECT i.relname AS name, ix.indisprimary AS primary, ix.indisunique AS unique, ix.indkey AS indkey, array_agg(a.attnum) as column_indexes, array_agg(a.attname) AS column_names, pg_get_indexdef(ix.indexrelid) AS definition FROM pg_class t, pg_class i, pg_index ix, pg_attribute a WHERE t.oid = ix.indrelid AND i.oid = ix.indexrelid AND a.attrelid = t.oid AND t.relkind = 'r' and t.relname = 'items' GROUP BY i.relname, ix.indexrelid, ix.indisprimary, ix.indisunique, ix.indkey ORDER BY i.relname;

<--- Last few GCs --->

[1599:0x103001000]    53198 ms: Mark-sweep 1396.2 (1424.2) -> 1395.6 (1423.2) MB, 1951.8 / 0.0 ms  (average mu = 0.082, current mu = 0.019) allocation failure scavenge might not succeed
[1599:0x103001000]    53210 ms: Scavenge 1396.4 (1423.2) -> 1395.8 (1423.7) MB, 4.2 / 0.0 ms  (average mu = 0.082, current mu = 0.019) allocation failure 
[1599:0x103001000]    53224 ms: Scavenge 1396.6 (1423.7) -> 1396.0 (1424.2) MB, 7.3 / 0.0 ms  (average mu = 0.082, current mu = 0.019) allocation failure 


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1e2d4c25be3d]
Security context: 0x3a2af6f9e6e9 <JSObject>
    1: /* anonymous */(aka /* anonymous */) [0x3a2aed282249] [/Users/ldu020/workspace/github.com/mrdulin/node-sequelize-examples/node_modules/sequelize/lib/model.js:~2551] [pc=0x1e2d4c8edaa1](this=0x3a2a987026f1 <undefined>,values=0x3a2acd876e81 <Object map = 0x3a2aef0e87a1>)
    2: arguments adaptor frame: 3->1
    3: StubFrame [pc: 0x1e2d4c226699]
    4: m...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003cff5 node::Abort() [/usr/local/bin/node]
 2: 0x10003d1ff node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001b8265 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1005861c2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x100588c95 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
 6: 0x100584b3f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0x100582d14 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x10058f5ac v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0x10058f62f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10055ef74 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007e7254 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0x1e2d4c25be3d 
[1]    1599 abort      npx ts-node 

code here:

import { Model, DataTypes } from 'sequelize';
import { sequelize } from '../../db';
import faker from 'faker';

class Item extends Model {}
Item.init(
  {
    tenant_id: {
      type: DataTypes.UUID,
      allowNull: false,
    },
  },
  { sequelize, tableName: 'items', modelName: 'item' },
);

(async function() {
  try {
    await sequelize.sync({ force: true });
    // seed
    const items = Array.from({ length: 4000 * 1000 }).map((_) => ({ tenant_id: faker.random.uuid() }));
    await Item.bulkCreate(items, { benchmark: true });
  } catch (error) {
    console.log(error);
  } finally {
    await sequelize.close();
  }
})();
  • sequelize version: "sequelize": "^5.21.3",
  • PostgreSQL version: PostgreSQL 9.6
  • "ts-node": "^8.10.2",
  • "typescript": "^3.9.6"

The code was written by TypeScript and I execute this script using npx ts-node index.ts command.

Is this a sequelize problem or a PostgreSQL database problem? Are there any restrictions on the amount of data when using Model.bulkCreate method.

How should I solve this?

UPDATE

The chunk size is 1000.

for (let i = 0; i < 4000; i++) {
  const items = Array.from({ length: 1000 }).map((_) => ({ tenant_id: faker.random.uuid() }));
  await Item.bulkCreate(items, { benchmark: true, returning: false });
}

It works. But it took 10 minutes to insert four million pieces of data. How should I decide the chunk size so I can achieve the maximum insertion speed?

PostgreSQL can generate UUIDs for you ( pgcrypto , uuid-ossp , or natively in PostgreSQL 13 ) saving all of the client-side memory, serialization, and parsing:

INSERT INTO items (tenant_id)
  SELECT gen_random_uuid() FROM generate_series(1, 4000 * 1000)

(this can also be done in chunks)

It looks like the Node process just ran out of memory - as others have said, you can either chunk the data you're loading or raise the amount of heap memory with a flag:

npx ts-node index.ts --max_old_space_size=4096

or an environment variable:

NODE_OPTIONS=--max_old_space_size=4096

But this assumes you have 4GB of RAM available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM