简体   繁体   English

创建 REST API 以允许上传大型数据集

[英]Creating a REST API to allow upload of large data sets

I am currently creating a suite of REST APIs that will be used to upload a non-determined number of rows of information to our database.我目前正在创建一套 REST API,用于将不确定数量的信息行上传到我们的数据库。 These APIs would be used by developers from a third-party company team.这些 API 将由第三方公司团队的开发人员使用。

The amount of information would start in the daily bulk upload of about 4k rows of information with an estimated increase of up to 5k more rows of information in about 4 months.信息量将从每天批量上传约 4k 行信息开始,预计在约 4 个月内增加多达 5k 行信息。 My question is, what would be the best way to design said upload API?我的问题是,设计上传 API 的最佳方式是什么?

Before I write down some of the ideas, I've been reading about here are some considerations to take into account.在我写下一些想法之前,我一直在阅读这里有一些需要考虑的注意事项。

  • The upload of information and the use of these APIs will almost always only be done once a day.信息的上传和这些 API 的使用几乎总是每天只进行一次。
  • The overall structure of a row of information looks like this, times 4k.一行信息的整体结构是这样的,乘以 4k。

    "data": [ {"InfoID": 1, "InfoName": "HELLO", "InfoValue": 1.00, "InfoDate": "2019-01-01"}, {"InfoID": 2, "InfoName": "WORLD", "InfoValue": 2.00, "InfoDate": "2019-01-02"} ]

Some of the ideas I've read about in designing this type of APIs are:我在设计此类 API 时了解到的一些想法是:

  • Limit the number of information rows that can be uploaded on the JSON parameter using page number information control.使用页码信息控件限制可以在 JSON 参数上上传的信息行数。 This would mean the third-party team would have to implement said pagination control when retrieving and uploading the information from their database.这意味着第三方团队在从他们的数据库中检索和上传信息时必须实施上述分页控制。
  • Upload a CSV file.上传 CSV 文件。 This might also implement pagination of file upload in case the file might be too heavy.这也可能实现文件上传的分页,以防文件太重。
  • A POST API that would upload a row information one by one, but I believe this is not the best of options for such large datasets.一个 POST API 会一个一个地上传行信息,但我相信这对于如此大的数据集来说不是最好的选择。

Any opinions, recommendations, and ideas would be helpful in taking a design decision.任何意见、建议和想法都有助于做出设计决策。

I would suggest a single endpoint that accepts POST requests.我建议使用一个接受POST请求的端点。 Let the body of the request be the entire batch of data in whatever formats you choose to accept it in - JSON, XML, CSV, etc. Have clients specify the Content-Type header to indicate what format they're sending the information in. Parse out that format to apply the batch of changes.让请求的正文是您选择接受它的任何格式的整批数据 - JSON、XML、CSV 等。让客户端指定Content-Type标头以指示他们发送信息的格式。解析该格式以应用批量更改。 If it's going to take more than a second or so to reply, send a 202 Accepted right away and a Location header with an endpoint where they can get a progress report on how the batch processing is going.如果回复时间超过一秒左右,请立即发送202 Accepted和带有端点的Location标头,他们可以在其中获得有关批处理进展情况的进度报告。

Note that you'll have to decide how to handle uploads that have some bad entries in them - either fail the whole batch or accept what you can.请注意,您必须决定如何处理包含一些错误条目的上传 - 要么使整个批次失败,要么接受您所能接受的。

Pagination is probably overkill.分页可能是矫枉过正。 Based on the example you gave, 5k entries is probably less than a single megabyte?根据您给出的示例,5k 个条目可能小于 1 兆字节? Weigh that against the annoyance of the client having to futz with pagination.权衡这一点与客户不得不使用分页的烦恼。 As a client, I wouldn't want to have to do that.作为客户,我不想这样做。

Requiring clients to POST 4k times to get all their data up is probably not the right idea because of the performance cost.由于性能成本,要求客户端 POST 4k 次以获取所有数据可能不是正确的想法。 It's also unlikely that clients will want to parse the data themselves to write the loop.客户端也不太可能希望自己解析数据来编写循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM