简体   繁体   中英

What's the best practice for initial (bulk) data import into RESTful system?

What would you consider a clean and efficient way to initially populate a data storage which is part of a RESTful distributed system architecture?

We do already have a POST method for bulk importing which takes an XML feed, parses, validates and imports the data. So one possibility would require the client to POST against our REST interface (probably in chunks, if we ran into time-out problems with the request).

The data store itself is based on MongoDB, so on the other hand side you could also think about a low-level bulk import, which takes a gzipped data file, uncompresses it and directly import the JSON data into the database (which of course would circumvent our business logic to validate the data to import).

What is your opinion and recommendation, are there any REST pattern which gives an advice on this problem?

Without knowing more details, I think you nailed it ultimately. I would split up the data into chunks, and then run a program to read one of these chunks and post this data to your http interface.

The script/program that does the importing should only work with a chunk small enough to avoid timeouts, and should be aware of its success or not. In the event that a piece does timeout or fail, you should make sure that you know where in the import you were, so that you can have it re-try from that same place.

Having said all that, its also nice if your system allows the same thing to be imported multiple times without consequence (see http://en.wikipedia.org/wiki/Idempotence ) so in the event that you have to completely re-send 1 segment, your restful backend will be able to accept it without duplicating data.

If you get it working well, you can even run your importing program with multiple chunks simultaneously to make it parallel and faster. (so long as your http/restful backend can handle it)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM