简体   繁体   中英

Bulk REST API POST Processing

I'm migrating 40,000 records from one system to another, and the only way to import data into the receiving system is via rest API POST calls.

I'm looking for advice on the fastest approach to iterate through 40,000 REST API calls. I have the data I need to transfer formatted as JSON, and I've chunked the objects into 40+ .json files using PHP. Ideally I'd like to handle the POST's asynchronously if possible, any advice on approaches using PHP, JavaScript, Node.js or bash would be tremendously helpful.

You can make simultaneous POST calls with PHP via curl's multi functions. Comments in the code.

$json_files = array('1.json','2.json', ... , '40.json');
$count = 0;
foreach($json_files as $json_file) {

    $list_of_objects = json_decode(file_get_contents($json_file),true);

    if(!$list_of_objects) {
        //log error
        continue;
    }

    //chunk into arrays of size 10 
    //or whatever # you want to run simultaneously
    $chunked_list = array_chunk($list_of_objects,10);

    foreach($chunked_list as $chunk) {
        $handles = array();    
        $mh = curl_multi_init();  

        foreach($chunk as $item) {
            $ch = curl_init('your api url here');  
            curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch,CURLOPT_POST, 1);
            curl_setopt($ch,CURLOPT_POSTFIELDS, http_build_query($item));
            curl_multi_add_handle($mh, $ch);
            //index your handles by item id so 
            //you know what succeeded or failed
            $handles[$item['id']] = $ch;
        }

        //execute all 10 posts simultaneously
        //continue when all are complete
        $running = null;
        do {
            $status = curl_multi_exec($mh, $running);
        } while ($status === CURLM_CALL_MULTI_PERFORM || $running);

        foreach($handles as $item_id => $handle) {

            if(curl_multi_getcontent($handle) == 'my success message') {
                //log $item_id to success file
            }
            else {
                //log $item_id to fail file so you can retry later
            }

            curl_multi_remove_handle($mh, $handle);        
        }

        curl_multi_close($mh);
        $count += 10;
        print "$count ...\n";        
    }
}

First to say this: if you already have used PHP to write those JSON files, I'm sure you can adapt that PHP script to post directly to the new server?

This is a batch job so you'd assume this is a one-time script (though it's better to write it so you can reuse it). The key thing is to find out how many concurrent requests your new server can handle. 40k requests in say, 10 concurrent requests with say, 1 second each, you should be done within two hours.

And in node specifically, make sure to set your global parallel requests number to more then 6, if your new server can handle it. ( http.globalAgent.maxSockets = 20 - max requests for the same hostname).

You can use a module like async or write your own simple module for parallel requests. If you were using async, you'd have something like async.parallelLimit() for this purpose.

To get more specific answers, you'd have to specify your request a bit more, maybe throw in a bit of code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM