简体   繁体   English

使用php在mysql中插入一百万行时如何防止内存不足

[英]How do I prevent running out of memory when inserting a million rows in mysql with php

I have built a script in Laravel that reads a JSON file line by line and imports the contents into my database. 我在Laravel中构建了一个脚本,它逐行读取JSON文件并将内容导入我的数据库。

However, when running the script, I get an out of memory error after inserting about 80K records. 但是,在运行脚本时,插入大约80K记录后出现内存不足错误。

mmap() failed: [12] Cannot allocate memory

mmap() failed: [12] Cannot allocate memory
PHP Fatal error:  Out of memory (allocated 421527552) (tried to allocate 12288 bytes) in /home/vagrant/Code/sandbox/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php on line 1758

mmap() failed: [12] Cannot allocate memory
PHP Fatal error:  Out of memory (allocated 421527552) (tried to allocate 32768 bytes) in /home/vagrant/Code/sandbox/vendor/symfony/debug/Exception/FatalErrorException.php on line 1

I have built a sort of makeshift queue to only commit the collected items every 100, but this made no difference. 我已经构建了一种临时队列,只能每100次提交收集的项目,但这没有任何区别。

This is what the part of my code that does the inserts looks like: 这就是我的代码中插入的部分如下所示:

public function callback($json) {

    if($json) {

        $this->queue[] = [

            'type' => serialize($json['type']),
            'properties' => serialize($json['properties']),
            'geometry' => serialize($json['geometry'])
        ];

        if ( count($this->queue) == $this->queueLength ) {

            DB::table('features')->insert( $this->queue );

            $this->queue = [];
        }
    }
}

It's the actual inserts ( DB::table('features')->insert( $this->queue ); ) that are causing the error, if I leave those out I can perfectly iterate over all lines and echo them out without any performance issues. 这是导致错误的实际插入( DB::table('features')->insert( $this->queue ); )如果我把它们留下来,我可以完美地遍历所有行并在没有任何行的情况下回显它们性能问题。

I guess I could allocate more memory, but doubt this would be a solution because I'm trying to insert 3 million records and it's currently already failing after 80K with 512Mb memory allocated. 我想我可以分配更多的内存,但是怀疑这将是一个解决方案,因为我试图插入300万条记录,并且它在80K之后分配512Mb内存时已经失败了。 Furthermore, I actually want to run this script on a low budget server. 此外,我实际上想在低预算的服务器上运行此脚本。

The time it takes for this script to run is not of any concern, so if I could somehow slow the insertion of records down that would be a solution I could settle for. 这个脚本运行所花费的时间并不是什么问题,所以如果我能以某种方式减慢记录的插入速度,那将是我能够解决的解决方案。

If you use MySQL 5.7+, it has feature for importing data from JSON file using LOAD DATA INFILE . 如果您使用MySQL 5.7+,它具有使用LOAD DATA INFILE从JSON文件导入数据的功能。

LOAD DATA INFILE 'afile.json' INTO TABLE atable (field1, field2, ....);

If you use lower MySQL version then you need to convert JSON into CSV format first. 如果您使用较低的MySQL版本,则需要先将JSON转换为CSV格式。 For example using https://github.com/danmandle/JSON2CSV 例如,使用https://github.com/danmandle/JSON2CSV

LOAD DATA INFILE 'afile.csv' INTO TABLE atable (field1, field2, ....);

See LOAD DATA INFILE 's documentation . 请参阅LOAD DATA INFILE文档

Check to make sure your query logging is off. 检查以确保您的查询日志记录已关闭。 If your query logging is on, even though you are clearing out $this->queue, the query log is growing until it grows to an unmanageable size and you hit your memory limit. 如果您的查询日志记录已启用,即使您正在清除$ this-> queue,查询日志也会增长,直到它增长到无法管理的大小并达到内存限制。

To check if query logging is on or off, use the DB facade like this: 要检查查询日志记录是打开还是关闭,请使用以下数据库外观:

DB::logging()

If this returns true, then your logging is on, and that's your problem. 如果返回true,那么您的日志记录已启用,这就是您的问题。 If this returns false, then it's some other configuration that is holding onto your queries. 如果返回false,那么它将保留您的查询。

About problem, I think you should use message queue ( Gearman, Rabbit ... ) to fix this problem. 关于问题,我认为你应该使用消息队列(Gearman,Rabbit ...)来解决这个问题。

Fetch all record and push to queue. 获取所有记录并推送到队列。 Queue will process one by one 队列将逐个处理

On scaffolding my application I installed the itsgoingd/clockwork package. 在脚手架上我的应用程序我安装了itsgoingd / clockwork包。 It's a habit I've gotten into because it's such a useful tool. 这是我习惯的习惯,因为它是一个非常有用的工具。 (I recommend any Laravel devs reading this to check it out!) (我建议任何阅读此内容的Laravel开发人员查看它!)

Of course this package logs queries for debugging purposes and thus, was the culprit eating up all of my memory. 当然,这个包会记录查询以进行调试,因此,罪魁祸首会占用我所有的记忆。

Disabling the package by deleting the reference to its service provider in my configuration solved the issue. 通过在我的配置中删除对其服务提供者的引用来禁用该包解决了该问题。

Cheers to @dbushy727 for giving me a push in the right direction to figure this out. 欢呼@ dbushy727给我一个正确的方向来解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM