简体   繁体   English

在PHP中处理大量非常大的文件

[英]Working with a large amount of really large files in PHP

I have a bunch of text files that look something like this: 我有一堆看起来像这样的文本文件:

987654 Example 1
321987 Test 2
654321 Whatever 1

Each column represents a specific value (eg, ID, timestamp, name, etc.). 每列代表一个特定值(例如,ID,​​时间戳,名称等)。 I'm trying to funnel all of this into a MySQL table. 我正在尝试将所有这些都汇集到MySQL表中。 I need to read each line of these files individually and parse what part of each line should go into what column in the row. 我需要分别阅读这些文件的每一行,并分析每一行的哪一部分应该放入该行的哪一列。

Each file contains about 5,000,000 lines. 每个文件包含大约5,000,000行。 I tried doing a test with just this: 我试图用这个做一个测试:

$test = array();
for($i=1;$i<5000000;$i++){
  $test[] = '';
}

Even a blank array with that many elements maxes out my memory limit (64mb, it needs to stay at that too because my host doesn't allow anything larger), so turning the file into an array is impossible, and probably a little silly to consider in retrospect. 即使是一个包含这么多元素的空白数组,也会使我的内存限制最大(64mb,它也需要保持在该限制,因为我的主机不允许更大的东西),因此将文件变成数组是不可能的,并且可能有点愚蠢。回想一下。 I'm out of my element here because I've never had to do something like this before. 我不在这里,因为我以前从未做过这样的事情。

How can I do something like foreach line in file without using an array? 如何在不使用数组的情况下执行文件中的foreach行之类的操作?

Check out if MySQL built-in LOAD DATA INFILE statement doesn't fit for you. 查看MySQL内置的LOAD DATA INFILE语句是否不适合您。

If not, you can use PHP SplFileObject class to iterate over your files lines without loading all them into memory. 如果没有,则可以使用PHP SplFileObject类来遍历文件行,而无需将所有文件行都加载到内存中。 It have specific methods to parse lines like that, like SplFileObject::fgetcsv() and SplFileObject::fscanf() . 它具有解析SplFileObject::fgetcsv()行的特定方法,例如SplFileObject::fgetcsv()SplFileObject::fscanf() In this case you might want to be using PDO to have a MySQL transaction to commit all insert statements at once to speed up the import proccess or rollback all them if something goes wrong. 在这种情况下,您可能希望使用PDO来让MySQL事务一次提交所有插入语句,以加快导入过程或在出现问题时回滚所有插入语句。

I agree with sectus, do the LOAD DATA INFILE , and let MySQL do the dirty work. 我同意sectus,执行LOAD DATA INFILE ,然后让MySQL做一些肮脏的工作。

Another way around if you absolutely need to use php would be to use some kind of 'parallel processing' this SO Question has more info on that. 如果您绝对需要使用php,则另一种方法是使用某种“并行处理”, 因此SO Question对此有更多信息。

If you decide to use the php approach you should read line by line the using fgets and then throwing each line chunk to a different thread to be processed. 如果决定使用php方法,则应使用fgets逐行阅读,然后将每一行大块丢给要处理的不同线程。 That way you don't eat your allowed memory and should get the work done in less time. 这样,您就不会吃掉允许的内存,而应该在更短的时间内完成工作。

For such large files, you need bigdump script if your files are properly delimited. 对于此类大文件,如果文件正确分隔,则需要bigdump脚本。 It is easy to use and very effective and fast. 它易于使用,非常有效且快速。 I use it to import such big files to mysql. 我用它将这样的大文件导入mysql。 bigDump 大转储

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM