简体   繁体   English

通过API提取数百万条记录

[英]Fetching millions of records through API

I have an API call from my application to my other application via cURL, passing POST variables, like so: 我通过cURL从应用程序到其他应用程序进行API调用,传递POST变量,如下所示:

$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => $url,
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => $paramString,
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_TIMEOUT => 600,
    CURLOPT_CONNECTTIMEOUT => 60,
));

In this $url application I try to fetch 20mln of records from one table and then pass it via JSON, like that: 在这个$url应用程序中,我尝试从一个表中获取20mln的记录,然后通过JSON传递它,如下所示:

public function apiMethod()
{
   $response = $this -> _db -> fetchAll('SELECT t.hash FROM table t WHERE id BETWEEN 20000000 AND 40000000;');
   echo json_encode($response);
}

Using PDO of course. 当然使用PDO。

Ok, there are two problems with that: 1. FetchAll() doesn't work for so many records - memory exceeds. 好的,这有两个问题:1. FetchAll()对这么多的记录不起作用-内存超出了。 2. Json doesn't work for so many records - there's max size for json. 2. Json不能处理那么多记录-json的最大大小。

I've been thinking about calling cURL many times and fetching every time, let's say, 100,000 records instead of fetching all. 我一直在考虑多次调用cURL并每次获取100,000条记录,而不是全部获取。 Is this the only way? 这是唯一的方法吗? What's the best way to do that? 最好的方法是什么?

Your main problem is architecture. 您的主要问题是架构。

The best way apparently is not to have an API that requires transfer of zillions rows on every call. 显然,最好的方法是不要有一个API,该API要求每次调用都传输不计其数的行。

Either implement a method to retrieve only one row, which is suitable for API, or reconsider whole architecture. 要么实现一种仅检索一行(适用于API)的方法,要么重新考虑整个体系结构。 Say, disk replication or database replication or such. 说,磁盘复制或数据库复制等。

You definitely should not use fetchAll, since it just fills your memory. 您绝对不应该使用fetchAll,因为它只会占用您的内存。 Are you sure you need a full data transfer every time? 您确定每次都需要完整的数据传输吗? Often one just needs to transfer differences. 通常,人们只需要转移差异即可。 That of course makes your API much more complex. 当然,这会使您的API更加复杂。

You either must implement a stable connection and push your data every tenthousand rows or you could prepare a file (also every tenthousand rows) by cronjob and transfer this file with a method like filetransfer 您要么必须实现稳定的连接并每万行推送数据,要么可以通过cronjob准备文件(每万行),并使用诸如filetransfer之类的方法来传输此文件

If you write a file, you could "fake" the json-Array-Part "[" and "]" and just concat all your rows. 如果编写文件,则可以“伪造” json-Array-Part“ [”和“]”,然后合并所有行。

Are you sure json is the right format? 您确定json是正确的格式吗? If you have just one column there is not much structure really. 如果只有一列,那么实际上没有太多结构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM