简体   繁体   English

使用PHP将巨大的文本文件中的结构化数据处理到数据库中?

[英]Processing structured data into a database from a giant text file using PHP?

I have text files containing structured data (it is a proprietary format and not something simple or common like CSV). 我有包含结构化数据的文本文件(它是专有格式,而不是CSV之类的简单或常见的文件)。 I'm trying to put this data into a database. 我正在尝试将此数据放入数据库中。 The text files are as large as 50GB so it's impossible for me to read the entire file into memory, extract it into an array, then process it into the database. 文本文件最大为50GB,因此我无法将整个文件读取到内存中,将其提取到数组中,然后将其处理到数据库中。

The text files are structured in such a way that data on a particular "item" (a specific id in the database) can have multiple lines (new lines) of information in the text file. 文本文件的结构使得特定“项目”(数据库中的特定ID)上的数据可以在文本文件中包含多行(新行)信息。 Items in the text file always start with a line that begins with '01' and can have an infinite number of additional lines (all one after the other), that will all start with 02 or 03 ... up to 08. A new item begins when a new line starts with 01. 文本文件中的项目始终以“ 01”开头的行开头,并且可以有无限多的其他行(一个接一个),所有其他行都以02或03开头,直到08。当新行以01开头时,项目开始。

01some_data_about_the_first_item
02some_more_data_about_the_first_item
05more_data_about_the_first_item
01the_first_line_of_the_second_item

I'd like to use PHP to process this data. 我想使用PHP处理此数据。

How can I load a piece of this text file into memory where I can analyze it, get all the lines for an item, and then process it? 如何将一个文本文件加载到内存中以进行分析,获取项目的所有行然后进行处理? Is there a way to load all lines up to the next line that starts with 01, process that data, then begin the next scan of the text file at the end of the last scan? 有没有一种方法可以将所有行加载到以01开始的下一行,处理该数据,然后在上次扫描结束时开始对文本文件进行下一次扫描?

Processing the data once I've loaded pieces of it into memory is not the problem. 一旦将数据加载到内存中,就可以处理数据了。

Sure. 当然。 Since you tagged the question with csv , I'll assume you have a CSV file. 由于您使用csv标记了问题,因此我假设您有一个CSV文件。 In that case, fgetcsv is a good function to use, which get one line from the file at a time . 在这种情况下, fgetcsv是一个很好的函数,可以一次从文件中获取一行 Using that you can get as many lines as you need for one record, then process it, then continue with the next one. 使用它,您可以获得一条记录所需的任意多行,然后对其进行处理,然后继续进行下一条记录。 Rough mockup: 粗糙的模型:

$fh = fopen('file.csv', 'r');
$record = array();

do {
    $line = fgetcsv($fh);

    if ($line && $line[0] != '01') {
        // any line that does not start with 01 is part of the current record,
        // adjust condition as necessary
        $record[] = $line;
    } else if ($record) {
        /* put current $record into database */

        // start next record
        $record = array($line);
    }
} while ($line);

Here is a start: 这是一个开始:

<?php
$fp=fopen('big.txt','r');

while($line=fgets($fp)){
    $number=substr($line,0,2);
    $data=substr($line,2);

    // proccess each line
    echo $number.' - '.$data;
}
fclose($fp);
?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM