简体   繁体   English

批量REST API POST处理

[英]Bulk REST API POST Processing

I'm migrating 40,000 records from one system to another, and the only way to import data into the receiving system is via rest API POST calls. 我正在将40,000条记录从一个系统迁移到另一个系统,将数据导入接收系统的唯一方法是通过rest API POST调用。

I'm looking for advice on the fastest approach to iterate through 40,000 REST API calls. 我正在寻找有关迭代40,000个REST API调用的最快方法的建议。 I have the data I need to transfer formatted as JSON, and I've chunked the objects into 40+ .json files using PHP. 我有需要传输的数据格式为JSON,我已经使用PHP将对象分成40+ .json文件。 Ideally I'd like to handle the POST's asynchronously if possible, any advice on approaches using PHP, JavaScript, Node.js or bash would be tremendously helpful. 理想情况下,如果可能的话,我想异步处理POST,对使用PHP,JavaScript,Node.js或bash的方法的任何建议都会非常有用。

You can make simultaneous POST calls with PHP via curl's multi functions. 您可以通过curl的多功能与PHP同时进行POST调用。 Comments in the code. 代码中的注释。

$json_files = array('1.json','2.json', ... , '40.json');
$count = 0;
foreach($json_files as $json_file) {

    $list_of_objects = json_decode(file_get_contents($json_file),true);

    if(!$list_of_objects) {
        //log error
        continue;
    }

    //chunk into arrays of size 10 
    //or whatever # you want to run simultaneously
    $chunked_list = array_chunk($list_of_objects,10);

    foreach($chunked_list as $chunk) {
        $handles = array();    
        $mh = curl_multi_init();  

        foreach($chunk as $item) {
            $ch = curl_init('your api url here');  
            curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch,CURLOPT_POST, 1);
            curl_setopt($ch,CURLOPT_POSTFIELDS, http_build_query($item));
            curl_multi_add_handle($mh, $ch);
            //index your handles by item id so 
            //you know what succeeded or failed
            $handles[$item['id']] = $ch;
        }

        //execute all 10 posts simultaneously
        //continue when all are complete
        $running = null;
        do {
            $status = curl_multi_exec($mh, $running);
        } while ($status === CURLM_CALL_MULTI_PERFORM || $running);

        foreach($handles as $item_id => $handle) {

            if(curl_multi_getcontent($handle) == 'my success message') {
                //log $item_id to success file
            }
            else {
                //log $item_id to fail file so you can retry later
            }

            curl_multi_remove_handle($mh, $handle);        
        }

        curl_multi_close($mh);
        $count += 10;
        print "$count ...\n";        
    }
}

First to say this: if you already have used PHP to write those JSON files, I'm sure you can adapt that PHP script to post directly to the new server? 首先要说的是:如果您已经使用PHP编写这些JSON文件,我确信您可以调整该PHP脚本以直接发布到新服务器吗?

This is a batch job so you'd assume this is a one-time script (though it's better to write it so you can reuse it). 这是一个批处理作业,因此您可以假设这是一次性脚本(尽管最好编写它以便您可以重复使用它)。 The key thing is to find out how many concurrent requests your new server can handle. 关键是找出新服务器可以处理多少并发请求。 40k requests in say, 10 concurrent requests with say, 1 second each, you should be done within two hours. 40k请求,比如10个并发请求,比如每个1秒,你应该在两个小时内完成。

And in node specifically, make sure to set your global parallel requests number to more then 6, if your new server can handle it. 特别是在节点中,如果新服务器可以处理它,请确保将全局并行请求数设置为6以上。 ( http.globalAgent.maxSockets = 20 - max requests for the same hostname). http.globalAgent.maxSockets = 20 - 对同一主机名的最大请求数)。

You can use a module like async or write your own simple module for parallel requests. 您可以使用async类的模块,也可以为并行请求编写自己的简单模块。 If you were using async, you'd have something like async.parallelLimit() for this purpose. 如果您使用的是异步,那么您可以使用async.parallelLimit()来实现此目的。

To get more specific answers, you'd have to specify your request a bit more, maybe throw in a bit of code. 要获得更具体的答案,您必须更多地指定您的请求,可能会抛出一些代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM