简体   繁体   中英

how to efficiently insert bulk data set into mysql database

I have some question regarding the symfony5 command class and how to efficiently insert ca. 10 million entries (only one entity with an Uuid field and without any relation to other entities). The whole gimmick has no purposes, and is only needed in order to run some test with elasticsearch.

Right now, while inserting the data, everything works so far ok, but it last for hours (20k / h).

    for ($i = 0; $i < $numberOfVochers; $i++) {

        $voucher = new Voucher();
        $voucher->setCode(Uuid::v4());
        $voucher->setValid(new DateTime());
      
        $this->em->persist($voucher);
        $this->em->flush();
    }

What I am supposed to do (except to get rid of my hardware: Macbook Pro 2,3 GHz Intel Core i5, 8GB) to get this job faster?

For one thing you should probably perform bulk updates, eg like this:

for ($i = 0; $i < $numberOfVochers; $i++) {
    $voucher = new Voucher();
    $voucher->setCode(Uuid::v4());
    $voucher->setValid(new DateTime());
      
    $this->em->persist($voucher);
    if ($i % 100) {
        $this->em->flush();
    }
}
$this->em->flush(); // just in case the last badge was not added

Additionally you should call $this->em->clear() after each flush, to ensure you don't run into memory issues. In your case $voucher does not rely on previously inserted data, so clear() should not pose any issues.

Since this action is performed in bulk you can now update your command to partition the creation, ie instead of calling your command once for all vouchers you could start the process 4 times for 1/4 of the vouchers. Then you have 4 processes doing the insert, which usually speeds up performance as each process can run on a different processor. In your case, since each voucher can be created independently this should not be much work. In other cases you probably have to tailor your command to be able to partition the work properly.

Alternatively you can also use threading in your command (can't recommend) or use something like the messenger to split up the task into batches, send a message for each batch and then use a number of workers to process the messages.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM