用PHP处理大文件

Question

What is the best way of processing very large files in php. 在php中处理超大文件的最佳方法是什么？ This is my current the scenario: 这是我目前的情况：

I extract a raw file from a Network Mgmt System (NMS) about all parameters of all network elements (NMS is running in a UNIX box). 我从网络管理系统（NMS）中提取了有关所有网络元素的所有参数的原始文件（NMS在UNIX框中运行）。
I FTP the raw file in my PC box using PHP. 我使用PHP通过FTP将原始文件发送到PC盒中。
I process the raw file line by line using the PHP's fget() function. 我使用PHP的fget（）函数逐行处理原始文件。
Every line i use string matching and regexp matching to extract necessary data until i am able to compose 1 line of necessary data separated by commas (","). 在每一行中，我都使用字符串匹配和正则表达式匹配来提取必要的数据，直到我能够撰写以逗号（“，”）分隔的必要数据的第一行。
I repeat step 4 until EOF and have the full CSV file. 我重复步骤4直到EOF并拥有完整的CSV文件。
I then throw this data to my database using sql's "LOAD DATA INFILE" 然后，我使用sql的“ LOAD DATA INFILE”将此数据扔到数据库中

My problem now is, I have 1 raw file that reaches more/less 200MB, and has more/less 180 columns of data and because of these my php script cannot finish proccessing the whole file because upon processing it is exhausting all the 1024MB memory I allocate on my php.ini file. 我现在的问题是，我有1个原始文件达到了更多/更少200MB，并且有更多/更少180列数据，由于这些，我的php脚本无法完成处理整个文件的过程，因为在处理时，它耗尽了所有的1024MB内存，在我的php.ini文件中分配。

Hoping to have recomendations on the best workaround of this problem. 希望对这个问题的最佳解决方法提出建议。 Thanks! 谢谢！

code of the processing part below: 以下处理部分的代码：

while( !feof($fh) ){
set_time_limit(0);
$l_buffer = fgets( $fh, $fsize );
$l_LineStream = explode( ' ', trim( $l_buffer ) );
$l_FilteredLineStream = array_filter( $l_LineStream, array( $this, 'RemoveEmptyElement' ) );
$l_GrepMatchArray = preg_grep( '/^BSC.*_.*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
    foreach( $l_GrepMatchArray as $l_BSCFound ){
        $l_BSCFound = explode( '_', $l_BSCFound );
        $l_BSCHoming = $l_BSCFound[1];
    }
}
$l_GrepMatchArray = preg_grep( '/^BTS-[0-9]*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
    foreach( $l_GrepMatchArray as $l_BTSFound ){
        $l_CurrBTS = $l_BTSFound;
    }
}
/**/
if( $l_PrevBTS != $l_CurrBTS && isset( $l_BTSArray ) && count( $l_BTSArray ) > 0 ){
    #$this->BTS_Array[] = $l_BTSArray;
    if( $l_FirstLoop == true ){
        $this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath, true );
        $l_FirstLoop = false;
    }else{
        $this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath );
    }
}
/**/
if( count( $l_GrepMatchArray ) > 0 ){
    #var_dump( $l_FilteredLineStream );
    $l_BTSArray = $this->InstantiateEmptyBTSArray();
    #$l_BTSArray['CI'] = '';
    $l_BTSArray['BSC'] = $l_BSCHoming;
    $l_BTSArray['BCF'] = $l_FilteredLineStream[0];
    $l_BTSArray['BTS'] = $l_FilteredLineStream[3];
    $l_BTSArray['CELL NAME'] = $l_FilteredLineStream[6];
}
if( $l_GetPLMNNextLineData == true && isset( $l_BTSArray['PLMN'] ) ){
    $l_BTSArray['PLMN'] .= trim( $l_buffer );
    $l_GetPLMNNextLineData = false;
}
$l_GrepMatchArray = preg_match( '/\.\(.*$/', $l_buffer, $reg_match );

if( count( $reg_match ) > 0 ){
    $l_KeyName = substr( $reg_match[0], 2, strpos( $reg_match[0], ')' ) - 2 );
    preg_match( '/[[:space:]].*|[-].*/', $reg_match[0], $param_value );
    $l_BTSArray[$l_KeyName] = trim( $param_value[0] );
    if( $l_KeyName == 'PLMN' ){
        $l_GetPLMNNextLineData = true;
    }
}
$l_PrevBTS = $l_CurrBTS;
}

Answer 1

You should check if your script is really processing the big file line by line (one line at a time). 您应该检查脚本是否真的逐行（一次一行）处理大文件。

do you keep a read line in an array? 您是否将读取行保留在数组中？
do you write the CSV line instantly in your file, or do you keep all the generated lines in an array? 您是在文件中立即写入CSV行，还是将所有生成的行保留在数组中？
etc. 等等

If you process the file line by line, it should not use 1GB+ memory. 如果逐行处理文件，则不应使用1GB以上的内存。

Answer 2

Why do you save to MySQL only in the end of process? 为什么只在过程结束时保存到MySQL？ When you parsed a line flush that to the database, so you will just use few MB for each rows. 解析行时，会将其刷新到数据库，因此每行只用很少的MB。

To address the comment: 要发表评论：

You can use INSERT DELAYED to let database manage the load, and to don't stress it too much 您可以使用INSERT DELAYED来让数据库管理负载，并且不要过分强调

Answer 3

If you are running out of 1024MB of memory processing a 200MB file then I would suggest you have a memory issue somewhere. 如果您用尽1024MB的内存来处理200MB的文件，那么我建议您在某处存在内存问题。 I would suggest you check your code for areas that could be holding on to resources that are no longer required. 我建议您检查代码中是否存在可能不再需要的资源。

用PHP处理大文件

问题描述

3 个解决方案

解决方案1
1 2012-05-16 09:25:10

解决方案2
0 2012-05-16 09:24:51

To address the comment: 要发表评论：

解决方案3
0 2012-05-16 09:27:02

用PHP处理大文件

问题描述

3 个解决方案

解决方案1 1 2012-05-16 09:25:10

解决方案2 0 2012-05-16 09:24:51

To address the comment: 要发表评论：

解决方案3 0 2012-05-16 09:27:02

解决方案1
1 2012-05-16 09:25:10

解决方案2
0 2012-05-16 09:24:51

解决方案3
0 2012-05-16 09:27:02