I am trying to import data from CSV file to database using laravel queue. These CSV files are huge with around 500k number of rows.
I have learned somewhere that using laravel queue we don't need to think about connection time out but this is not looking true. Maybe I was wrong.
Please, check my job code if there is anything wrong in these methods. I am using "League\\Csv" to read the CSV file.
public function __construct($data,$error_arr,$error_row_numbers) {
$this->data = $data;
$this->error_arr = $error_arr;
$this->error_row_numbers = $error_row_numbers;
}
/**
* Execute the job.
*
* @return void
*/
public function handle()
{
$offset = $this->data['offset'];
$limit = $this->data['limit'];
$filename = $this->data['file_name'];
$service = new Service();
$table = 'committees';
$dbase = new Committee();
//map_data have array about which csv column
//should be inserted in which column of database
$map_data = $this->data['map_data'];
//get all columns name of a table
$db_header_obj = new Committee();
$db_header = $db_header_obj->getTableColumns();
$csv_file_path = storage_path('app/files/committee/').$filename;
if (!ini_get("auto_detect_line_endings")) {
ini_set("auto_detect_line_endings", TRUE);
}
$csv = Reader::createFromPath($csv_file_path, 'r');
$csv->setOutputBOM(Reader::BOM_UTF8);
$csv->addStreamFilter('convert.iconv.ISO-8859-15/UTF-8');
$csv->setHeaderOffset(0);
$csv_header = $csv->getHeader();
$rec_arr = array();
$records = array();
$records_arr = array();
$stmt = (new Statement())
->offset($offset)
->limit($limit)
;
$records = $stmt->process($csv);
foreach ($records as $record)
{
$rec_arr[] = array_values($record);
}
//trim index if the value of an array is empty
$records_arr = $service->trimArray($rec_arr);
if(count($records_arr)>0)
{
foreach($records_arr as $ck => $cv){
$committee_arr = array();
foreach ($map_data as $mk => $mv) {
if(isset($mv)){
$data_type = $service->getDatabaseColumnType($table,$mv);
//if data is one of datetime data type
//then format the csv data to mysql datetime format
if($data_type == 'date' || $data_type == 'datetime' || $data_type == 'timestamp'){
$datetime = (array)$cv[$mk];
$dt = array_shift($datetime);
$dt = date('Y-m-d h:i:s', strtotime($dt));
$committee_arr[$mv] = $dt;
}else{
$committee_arr[$mv] = $cv[$mk];
}
}
}
$error_encountered = false;
DB::beginTransaction();
if(!empty($committee_arr['com_id'])){
try{
$committee_row = Committee::updateOrCreate(
['com_id' => $committee_arr['com_id']],
$committee_arr
);
if ($committee_row->wasRecentlyCreated === true) {
$committee_row->created_by = $this->data['user_id'];
}else{
$committee_row->updated_by = $this->data['user_id'];
}
$committee_row->save();
} catch (\Exception $e) {
$error_encountered = true;
$this->error_arr[] = $e->getMessage();
$this->error_row_numbers[] = $this->data['row_value'];
}
}
DB::commit();
//just to keep track which row is currently processing
//so that user can be notified in which row of csv
//there is an error
$this->data['row_value'] = $this->data['row_value'] + 1;
}
//offset just to start fectch next chunk of data from csv
$this->data['offset'] = $offset + $limit;
//Call to same job but with increased offset value
$committeeInsertJob = (new StoreCommittee($this->data,$this->error_arr,$this->error_row_numbers))->delay(Carbon::now()->addSeconds(3));
dispatch($committeeInsertJob);
}else{
//Store activity just to keep track of activity
$activity = new Activity();
$activity->url = $this->data['url'];
$activity->action = 'store';
$activity->description = $table;
$activity->user_id = $this->data['user_id'];
$activity->created_at = date('Y-m-d H:i:s');
$activity->save();
$arr_data = [
'filename' => $filename,
'user_name' => $this->data['user_name'],
'error' => $this->error_arr,
'error_row_numbers' => $this->error_row_numbers
];
//Notify user that the job is complete
Mail::to($this->data['user_email'])->send(new CSVImportJobComplete($arr_data));
}
if (!ini_get("auto_detect_line_endings")) {
ini_set("auto_detect_line_endings", FALSE);
}
}
Error: From (laravel.log inside storage)
[2019-04-05 07:13:23] local.ERROR: PDOStatement::execute(): MySQL server has gone away (SQL: insert into `jobs` (`queue`, `attempts`, `reserved_at`, `available_at`, `created_at`, `payload`) values (default, 0, , 1554448406, 1554448403, ....................................................(long list)
From: command terminal
$ php artisan queue:work --tries=3
[2019-04-05 07:09:11][1] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:09:33][1] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:09:36][2] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:09:58][2] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:10:01][3] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:10:23][3] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:10:26][4] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:10:48][4] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:10:51][5] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:11:13][5] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:11:17][6] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:11:40][6] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:11:43][7] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:12:05][7] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:12:08][8] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:12:31][8] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:12:34][9] Processing: App\Jobs\StoreCommittee
[2019-04-05 07:12:57][9] Processed: App\Jobs\StoreCommittee
[2019-04-05 07:13:00][10] Processing: App\Jobs\StoreCommittee
dell@DESKTOP-UQ2 MINGW64 /d/wamp64/www/project(master)
$
(it stops without any error or failed notifications)
Is there anything I can improve on my job logic? How can I handle all this connection drop, or maxed timed out or some other stuff? I don't think the increasing timeout is the solution. As it can't be guaranteed that it will be finished within this fixed time.
Instead is there a way, were connection can be closed and again reconnect between each queue working?
You parsed the CSV file and you tried to send entire contents via a single query. MySQL contains variables that prevent it from accepting too large queries. It's called max_allowed_packet
Reason why you did that was performance. However, you can hit one of many variables related to networking / MySQL when dealing with a query that's too large in regards to data quantity.
Prepared the statement exactly once. Prepared statements are used once, executed multiple times
Parse the CSV and loop through records
Bind values to the prepared statements and execute it as you're going through the loop
To make everything faster, use transactions. Wrap every 1000 records in a transaction. That will let you write simple insert queries but they'll be fast because MySQL will multiplex the writes
You're using Laravel so steps above are super-easy
$csv = collect([]); // This is the array holding your CSV records
// Split the array into chunks. Let's assume you want to insert 1000 records in one attempt
$chunk_count = ceil($csv->count() / 1000);
$csv->chunk($chunk_count)->map(function($chunk) {
\DB::beginTransaction();
// Create a record
$chunk->map(function($data) {
StoreCommittee::create($data);
});
\DB::commit();
});
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.