简体   繁体   中英

Creating a REST API to allow upload of large data sets

I am currently creating a suite of REST APIs that will be used to upload a non-determined number of rows of information to our database. These APIs would be used by developers from a third-party company team.

The amount of information would start in the daily bulk upload of about 4k rows of information with an estimated increase of up to 5k more rows of information in about 4 months. My question is, what would be the best way to design said upload API?

Before I write down some of the ideas, I've been reading about here are some considerations to take into account.

  • The upload of information and the use of these APIs will almost always only be done once a day.
  • The overall structure of a row of information looks like this, times 4k.

    "data": [ {"InfoID": 1, "InfoName": "HELLO", "InfoValue": 1.00, "InfoDate": "2019-01-01"}, {"InfoID": 2, "InfoName": "WORLD", "InfoValue": 2.00, "InfoDate": "2019-01-02"} ]

Some of the ideas I've read about in designing this type of APIs are:

  • Limit the number of information rows that can be uploaded on the JSON parameter using page number information control. This would mean the third-party team would have to implement said pagination control when retrieving and uploading the information from their database.
  • Upload a CSV file. This might also implement pagination of file upload in case the file might be too heavy.
  • A POST API that would upload a row information one by one, but I believe this is not the best of options for such large datasets.

Any opinions, recommendations, and ideas would be helpful in taking a design decision.

I would suggest a single endpoint that accepts POST requests. Let the body of the request be the entire batch of data in whatever formats you choose to accept it in - JSON, XML, CSV, etc. Have clients specify the Content-Type header to indicate what format they're sending the information in. Parse out that format to apply the batch of changes. If it's going to take more than a second or so to reply, send a 202 Accepted right away and a Location header with an endpoint where they can get a progress report on how the batch processing is going.

Note that you'll have to decide how to handle uploads that have some bad entries in them - either fail the whole batch or accept what you can.

Pagination is probably overkill. Based on the example you gave, 5k entries is probably less than a single megabyte? Weigh that against the annoyance of the client having to futz with pagination. As a client, I wouldn't want to have to do that.

Requiring clients to POST 4k times to get all their data up is probably not the right idea because of the performance cost. It's also unlikely that clients will want to parse the data themselves to write the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM