[英]Processing large files in PHP
What is the best way of processing very large files in php. 在php中处理超大文件的最佳方法是什么? This is my current the scenario:
这是我目前的情况:
My problem now is, I have 1 raw file that reaches more/less 200MB, and has more/less 180 columns of data and because of these my php script cannot finish proccessing the whole file because upon processing it is exhausting all the 1024MB memory I allocate on my php.ini file. 我现在的问题是,我有1个原始文件达到了更多/更少200MB,并且有更多/更少180列数据,由于这些,我的php脚本无法完成处理整个文件的过程,因为在处理时,它耗尽了所有的1024MB内存,在我的php.ini文件中分配。
Hoping to have recomendations on the best workaround of this problem. 希望对这个问题的最佳解决方法提出建议。 Thanks!
谢谢!
code of the processing part below: 以下处理部分的代码:
while( !feof($fh) ){
set_time_limit(0);
$l_buffer = fgets( $fh, $fsize );
$l_LineStream = explode( ' ', trim( $l_buffer ) );
$l_FilteredLineStream = array_filter( $l_LineStream, array( $this, 'RemoveEmptyElement' ) );
$l_GrepMatchArray = preg_grep( '/^BSC.*_.*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
foreach( $l_GrepMatchArray as $l_BSCFound ){
$l_BSCFound = explode( '_', $l_BSCFound );
$l_BSCHoming = $l_BSCFound[1];
}
}
$l_GrepMatchArray = preg_grep( '/^BTS-[0-9]*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
foreach( $l_GrepMatchArray as $l_BTSFound ){
$l_CurrBTS = $l_BTSFound;
}
}
/**/
if( $l_PrevBTS != $l_CurrBTS && isset( $l_BTSArray ) && count( $l_BTSArray ) > 0 ){
#$this->BTS_Array[] = $l_BTSArray;
if( $l_FirstLoop == true ){
$this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath, true );
$l_FirstLoop = false;
}else{
$this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath );
}
}
/**/
if( count( $l_GrepMatchArray ) > 0 ){
#var_dump( $l_FilteredLineStream );
$l_BTSArray = $this->InstantiateEmptyBTSArray();
#$l_BTSArray['CI'] = '';
$l_BTSArray['BSC'] = $l_BSCHoming;
$l_BTSArray['BCF'] = $l_FilteredLineStream[0];
$l_BTSArray['BTS'] = $l_FilteredLineStream[3];
$l_BTSArray['CELL NAME'] = $l_FilteredLineStream[6];
}
if( $l_GetPLMNNextLineData == true && isset( $l_BTSArray['PLMN'] ) ){
$l_BTSArray['PLMN'] .= trim( $l_buffer );
$l_GetPLMNNextLineData = false;
}
$l_GrepMatchArray = preg_match( '/\.\(.*$/', $l_buffer, $reg_match );
if( count( $reg_match ) > 0 ){
$l_KeyName = substr( $reg_match[0], 2, strpos( $reg_match[0], ')' ) - 2 );
preg_match( '/[[:space:]].*|[-].*/', $reg_match[0], $param_value );
$l_BTSArray[$l_KeyName] = trim( $param_value[0] );
if( $l_KeyName == 'PLMN' ){
$l_GetPLMNNextLineData = true;
}
}
$l_PrevBTS = $l_CurrBTS;
}
You should check if your script is really processing the big file line by line (one line at a time). 您应该检查脚本是否真的逐行(一次一行)处理大文件。
If you process the file line by line, it should not use 1GB+ memory. 如果逐行处理文件,则不应使用1GB以上的内存。
Why do you save to MySQL only in the end of process? 为什么只在过程结束时保存到MySQL? When you parsed a line flush that to the database, so you will just use few MB for each rows.
解析行时,会将其刷新到数据库,因此每行只用很少的MB。
You can use INSERT DELAYED
to let database manage the load, and to don't stress it too much 您可以使用
INSERT DELAYED
来让数据库管理负载,并且不要过分强调
If you are running out of 1024MB of memory processing a 200MB file then I would suggest you have a memory issue somewhere. 如果您用尽1024MB的内存来处理200MB的文件,那么我建议您在某处存在内存问题。 I would suggest you check your code for areas that could be holding on to resources that are no longer required.
我建议您检查代码中是否存在可能不再需要的资源。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.