简体   繁体   English

用PHP处理大文件

[英]Processing large files in PHP

What is the best way of processing very large files in php. 在php中处理超大文件的最佳方法是什么? This is my current the scenario: 这是我目前的情况:

  1. I extract a raw file from a Network Mgmt System (NMS) about all parameters of all network elements (NMS is running in a UNIX box). 我从网络管理系统(NMS)中提取了有关所有网络元素的所有参数的原始文件(NMS在UNIX框中运行)。
  2. I FTP the raw file in my PC box using PHP. 我使用PHP通过FTP将原始文件发送到PC盒中。
  3. I process the raw file line by line using the PHP's fget() function. 我使用PHP的fget()函数逐行处理原始文件。
  4. Every line i use string matching and regexp matching to extract necessary data until i am able to compose 1 line of necessary data separated by commas (","). 在每一行中,我都使用字符串匹配和正则表达式匹配来提取必要的数据,直到我能够撰写以逗号(“,”)分隔的必要数据的第一行。
  5. I repeat step 4 until EOF and have the full CSV file. 我重复步骤4直到EOF并拥有完整的CSV文件。
  6. I then throw this data to my database using sql's "LOAD DATA INFILE" 然后,我使用sql的“ LOAD DATA INFILE”将此数据扔到数据库中

My problem now is, I have 1 raw file that reaches more/less 200MB, and has more/less 180 columns of data and because of these my php script cannot finish proccessing the whole file because upon processing it is exhausting all the 1024MB memory I allocate on my php.ini file. 我现在的问题是,我有1个原始文件达到了更多/更少200MB,并且有更多/更少180列数据,由于这些,我的php脚本无法完成处理整个文件的过程,因为在处理时,它耗尽了所有的1024MB内存,在我的php.ini文件中分配。

Hoping to have recomendations on the best workaround of this problem. 希望对这个问题的最佳解决方法提出建议。 Thanks! 谢谢!

code of the processing part below: 以下处理部分的代码:

while( !feof($fh) ){
set_time_limit(0);
$l_buffer = fgets( $fh, $fsize );
$l_LineStream = explode( ' ', trim( $l_buffer ) );
$l_FilteredLineStream = array_filter( $l_LineStream, array( $this, 'RemoveEmptyElement' ) );
$l_GrepMatchArray = preg_grep( '/^BSC.*_.*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
    foreach( $l_GrepMatchArray as $l_BSCFound ){
        $l_BSCFound = explode( '_', $l_BSCFound );
        $l_BSCHoming = $l_BSCFound[1];
    }
}
$l_GrepMatchArray = preg_grep( '/^BTS-[0-9]*$/', $l_FilteredLineStream );
if( count( $l_GrepMatchArray ) > 0 ){
    foreach( $l_GrepMatchArray as $l_BTSFound ){
        $l_CurrBTS = $l_BTSFound;
    }
}
/**/
if( $l_PrevBTS != $l_CurrBTS && isset( $l_BTSArray ) && count( $l_BTSArray ) > 0 ){
    #$this->BTS_Array[] = $l_BTSArray;
    if( $l_FirstLoop == true ){
        $this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath, true );
        $l_FirstLoop = false;
    }else{
        $this->WriteDataToCSVFile( $l_BTSArray, $l_FilePath );
    }
}
/**/
if( count( $l_GrepMatchArray ) > 0 ){
    #var_dump( $l_FilteredLineStream );
    $l_BTSArray = $this->InstantiateEmptyBTSArray();
    #$l_BTSArray['CI'] = '';
    $l_BTSArray['BSC'] = $l_BSCHoming;
    $l_BTSArray['BCF'] = $l_FilteredLineStream[0];
    $l_BTSArray['BTS'] = $l_FilteredLineStream[3];
    $l_BTSArray['CELL NAME'] = $l_FilteredLineStream[6];
}
if( $l_GetPLMNNextLineData == true && isset( $l_BTSArray['PLMN'] ) ){
    $l_BTSArray['PLMN'] .= trim( $l_buffer );
    $l_GetPLMNNextLineData = false;
}
$l_GrepMatchArray = preg_match( '/\.\(.*$/', $l_buffer, $reg_match );

if( count( $reg_match ) > 0 ){
    $l_KeyName = substr( $reg_match[0], 2, strpos( $reg_match[0], ')' ) - 2 );
    preg_match( '/[[:space:]].*|[-].*/', $reg_match[0], $param_value );
    $l_BTSArray[$l_KeyName] = trim( $param_value[0] );
    if( $l_KeyName == 'PLMN' ){
        $l_GetPLMNNextLineData = true;
    }
}
$l_PrevBTS = $l_CurrBTS;
}

You should check if your script is really processing the big file line by line (one line at a time). 您应该检查脚本是否真的逐行(一次一行)处理大文件。

  • do you keep a read line in an array? 您是否将读取行保留在数组中?
  • do you write the CSV line instantly in your file, or do you keep all the generated lines in an array? 您是在文件中立即写入CSV行,还是将所有生成的行保留在数组中?
  • etc. 等等

If you process the file line by line, it should not use 1GB+ memory. 如果逐行处理文件,则不应使用1GB以上的内存。

Why do you save to MySQL only in the end of process? 为什么只在过程结束时保存到MySQL? When you parsed a line flush that to the database, so you will just use few MB for each rows. 解析行时,会将其刷新到数据库,因此每行只用很少的MB。

To address the comment: 要发表评论:

You can use INSERT DELAYED to let database manage the load, and to don't stress it too much 您可以使用INSERT DELAYED来让数据库管理负载,并且不要过分强调

If you are running out of 1024MB of memory processing a 200MB file then I would suggest you have a memory issue somewhere. 如果您用尽1024MB的内存来处理200MB的文件,那么我建议您在某处存在内存问题。 I would suggest you check your code for areas that could be holding on to resources that are no longer required. 我建议您检查代码中是否存在可能不再需要的资源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM