简体   繁体   中英

Strategy for regularly updating datasets from excel files

I have ~10 excel files which are produced by a third party and updated each night and are available as a download. They contain ~ 10 fields (all short text / dates) and between ~10,000 and ~1m rows in each.

I'm planning to create a simple web application to enable people to search the data. I'll host it on AWS or similar. Search load will be light maybe ~1000 searches / day.

I have to assume that all the records are unique each night and need completely replace the online dataset.

It's relatively simple for me to convert the data from the excel files into a database such as Postgres and create a simple search on top of it.

My question is how do I deal with the time it takes to do the database update each night? Should I create two databases and have my application alternate between them every other night?

What is a typical strategy for dealing with a situation like this?

My current skill set is Ruby/Rails/Postgres building and simple(ish) web apps. I've been intentionally vague about technology because I'm open minded about what to use. And I'm quite happy to learn something new to solve the problem.

如果您一次完成所有更新,则不需要太多的dbs-在您每次更新表时,人们看到的都是“旧”版本,在COMMIT之后不久,他们将看到所有的“新”版本。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM