简体   繁体   English

使用PHP将大CSV文件导入MySQL数据库并检查重复项

[英]Importing big CSV files to MySQL database using PHP with checking for duplicates

I hope you can help me. 我希望你能帮助我。 I searched a lot, but unfortunately didn't find anything. 我搜索了很多,但不幸的是没有找到任何东西。 What's the problem? 有什么问题? I've got big CSV files with 1 column, which contains e-mail addresses. 我有1列的大型CSV文件,其中包含电子邮件地址。 There are about 50000 lines in single file. 单个文件中大约有50000行。 I'm creating administration panel, which allows to import these files to the server, using HTML form and PHP. 我正在创建管理面板,该面板允许使用HTML表单和PHP将这些文件导入服务器。 Importing CSV to MySQL database through PHP is simple, but I need something more - check for every e-mail does it exists, and if yes - skip it. 通过PHP将CSV导入MySQL数据库很简单,但是我还需要更多-检查每个电子邮件是否存在,如果是,请跳过它。 What's the problem? 有什么问题? Table has about million+ records, checking one e-mail lasts +/- 3 seconds. 该表有大约一百万条记录,检查一封电子邮件持续+/- 3秒。 50000 records multiplied by 3... it's gonna take min. 50000条记录乘以3 ...这将需要分钟。 44 hours! 44小时! PHP script stops responding after less than 10 minutes... So it's impossible do it this way: PHP脚本在不到10分钟后停止响应...因此无法通过这种方式进行操作:

function doesExist($email) {
    $sql = "SELECT count(*) as counter FROM mailing_subscribers WHERE subscriber_email LIKE :subscriber_email";
    $sth = $this->db->prepare($sql);
    $sth->execute(array(':subscriber_email' => $email));
    $row = $sth->fetch();
    $counter = $row->counter;
    if ($counter > 0) {
        return true;
    } else {
        return false;
    }
}

function importCSV($file,$group) {            

    $fp = fopen($file['tmp_name'], "r");
    $importsCounter = 0;

    while($csv_line = fgetcsv($fp)) {
        for ($i = 0, $j = count($csv_line); $i < $j; $i++) {
            if (!$this->doesExist($csv_line[$i])) {
                $sql = "INSERT INTO mailing_subscribers(subscriber_email,subscriber_group) VALUES('".$csv_line[$i]."','".$group."')";
                $sth = $this->db->prepare($sql);
                $sth->execute();
                $importsCounter++;
            }
        }
    }

    $_SESSION["feedback_positive"][] = FEEDBACK_FILE_IMPORT_SUCCESSFUL . " Utworzonych wpisów: " . $importsCounter;
}

$file is a $_FILE array. $file$_FILE数组。

Is there any other and faster method to do it? 还有其他更快的方法吗?

Below is my suggestion: 以下是我的建议:

1) Load your csv file in Temporary table. 1)在临时表中加载您的csv文件。 refer http://dev.mysql.com/doc/refman/5.1/en/load-data.html 参考http://dev.mysql.com/doc/refman/5.1/en/load-data.html

2) It will load your bulk csv data very fast may be in seconds. 2)它将在几秒钟内非常快地加载您的批量csv数据。 Now use insert query and insert data from temporary table to master table with duplicate value check. 现在使用插入查询,并通过重复值检查将数据从临时表插入到主表中。

EG 例如

1) Lets assume you have load csv data in temporary table named "TempTable" 1)假设您已在名为“ TempTable”的临时表中加载了csv数据

2) say your master table name is "mailing_subscribers" 2)说您的主表名称是“ mailing_subscribers”

3) say you do not want duplicate record to be insert. 3)说您不想插入重复的记录。

your query will be like: 您的查询将类似于:

insert into mailing_subscribers (subscriber_email,cola,colb..) select subscriber_email,cola,colb.. from TempTable where subscriber_email not in (select subscriber_email from mailing_subscribers) 插入到mailing_subscribers(subscriber_email,cola,colb ..)中,从TempTable中选择“ subscriber_email,cola,colb ..”,其中“ subscriber_email”不在其中(从mailing_subscribers中选择“ users_email”)

Please let me know if you face any issue. 如果您遇到任何问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM