简体   繁体   English

PHP将大型CSV文件导入MySQL表

[英]PHP Import large CSV file into MySQL table

I need to run a daily cron job that iterates over a 6 MB CSV file to insert each of the ~10,000 entries into a MySQL table. 我需要运行一个每天的cron作业,该作业遍历6 MB的CSV文件,以将〜10,000个条目中的每一个插入MySQL表。 The code I have written hangs and produces a timeout after a while. 我编写的代码在一段时间后挂起并产生超时。

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $dbdata = array(
            'SiteID' => $siteID,
            'TimeStamp' => $data[0],
            'ProductID' => $data[1],
            'CoordX' => $data[2],
            'CoordY' => $data[3]
        );  
        $row++;
        $STH = $DBH->prepare("INSERT INTO temp_csv (SiteID,TimeStamp,ProductID,CoordX,CoordY) VALUES (:SiteID,:TimeStamp,:ProductID,:CoordX,:CoordY)");
        $STH->execute($dbdata);
    }
    fclose($handle);
    echo $row." rows inserted.";
}

It would have been ideal to use mysql_* functions instead of PDO, so I could implode the values into one single query (although huge) but unfortunately I need to comply with some guidelines (PDO to be strictly used). 使用mysql_*函数而不是PDO是理想的,因此我可以将值mysql_*到一个查询中(尽管很大),但是不幸的是我需要遵守一些准则(必须严格使用PDO)。

I searched SO and there are very similar questions but none could solve mine. 我搜索了SO,有非常相似的问题,但是没有一个可以解决我的问题。 What I tried is the following: 我尝试了以下内容:

1- Ran LOAD DATA INFILE and LOAD DATA LOCAL INFILE queries but kept getting "file not found" errors although the file is definitely there with 777 permissions. 1-跑LOAD DATA INFILELOAD DATA LOCAL INFILE查询,但始终出现“找不到文件”错误,尽管该文件确实具有777权限。 The DB server and the shared hosting account are in different environments. 数据库服务器和共享主机帐户位于不同的环境中。 I tried relative and url paths to the csv file but no luck (couldn't find the file in both cases). 我尝试了csv文件的相对路径和url路径,但是没有运气(在两种情况下都找不到该文件)。

2- I split the csv file into 2 files and ran the script on each, to see the threshold at which the script hangs, but it inserted the entries twice in the table in the case of each file. 2-我将csv文件分成2个文件,并在每个文件上运行脚本,以查看脚本挂起的阈值,但是对于每个文件,它在表中插入了两次条目。

I don't have access to php.ini since it's a shared hosting account (cloudsites) and only access to MySQL through phpMyAdmin. 我没有访问php.ini权限,因为它是一个共享的托管帐户(cloudsites),并且只能通过phpMyAdmin访问MySQL

What else can I try to accomplish this as efficiently as possible? 我还能尝试什么来尽可能高效地完成此任务?

Any help is appreciated. 任何帮助表示赞赏。

The code looks not wrong to me. 该代码对我来说似乎没有错。 It hangs because it just takes a while to execute. 它挂起是因为它只需要一段时间才能执行。 You should use phps set_time_limit to prevent timeouts. 您应该使用phps set_time_limit来防止超时。

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
    set_time_limit(30) // choose a value that works for you
    // ... the rest of your script

Better, however would be to start a background-process where the csv is processed, it would need some sort of locking, so it doesn't run in multiple instances in parallel. 更好的方法是在处理csv的过程中启动一个后台进程,这需要某种锁定,因此它不会在多个实例中并行运行。 If you'd write the status into a file on disk you could present it easily to your users. 如果您将状态写入磁盘上的文件中,则可以轻松地将其呈现给用户。 The same applies for a cron script (if you can do that with your hosting solution) cron脚本也是如此(如果您可以使用托管解决方案来做到这一点)

The use of PDO looks ok to me. 对我来说,使用PDO看起来不错。 I wouldn't think of inserting all rows of the csv at once, but you could insert multiple rows at once with PDO, too. 我不会考虑一次插入csv的所有行,但是您也可以使用PDO一次插入多个行。 Create the statement and the data array for multiple rows. 为多行创建语句和数据数组。 It could look like this rough sketch (I did not execute it so there will probably be some errors): 它可能看起来像这个粗略的草图(我没有执行它,所以可能会有一些错误):

function insert_data($DBH, array $dbdata, array $values) {
    $sql = "INSERT INTO temp_csv (SiteID,TimeStamp,ProductID,CoordX,CoordY) VALUES %1$s;";
    $STH = $DBH->prepare(sprintf($sql, join(', ', $values)));
    $STH->execute($dbdata);
}

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    $dbdata = array();
    $values = array();
    $row = 0;
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        if(!count($dbdata))
            $dbdata['SiteID'] = $siteID;

        $dbdata['TimeStamp_'.$row] = $data[0];
        $dbdata['ProductID_'.$row] = $data[1];
        $dbdata['CoordX_'.$row] = $data[2];
        $dbdata['CoordY_'.$row] = $data[3];
        $values[] = sprintf('(:SiteID_%1$s,:TimeStamp_%1$s,:ProductID_%1$s,:CoordX_%1$s,:CoordY_%1$s)', $row);
        $row++;

        if($row % 10 === 0) {
            set_time_limit(30);
            insert_data($DBH, $dbdata, $values);
            $values = array();
            $dbdata = array();
        }
    }
    // insert the rest
    if(count($values))
        insert_data($DBH, $dbdata, $values);
    fclose($handle);
    echo $row." rows inserted.";
}

The shortcut to at least read the php.ini configurations is phpinfo . 至少阅读php.ini配置的快捷方式是phpinfo Look into the PHP manual, a lot of the config values can be set at runtime from your code. 查看PHP手册,可以在运行时从代码中设置许多配置值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM