简体   繁体   English

定期从Excel文件更新数据集的策略

[英]Strategy for regularly updating datasets from excel files

I have ~10 excel files which are produced by a third party and updated each night and are available as a download. 我有〜10个excel文件,这些文件是由第三方制作的,并且每晚都有更新,可以下载。 They contain ~ 10 fields (all short text / dates) and between ~10,000 and ~1m rows in each. 它们包含〜10个字段(所有短文本/日期),每个字段包含约10,000至〜1m行。

I'm planning to create a simple web application to enable people to search the data. 我打算创建一个简单的Web应用程序,以使人们能够搜索数据。 I'll host it on AWS or similar. 我将其托管在AWS或类似产品上。 Search load will be light maybe ~1000 searches / day. 搜索量会很轻,每天大约有1000次搜索。

I have to assume that all the records are unique each night and need completely replace the online dataset. 我必须假设所有记录在每个晚上都是唯一的,并且需要完全替换在线数据集。

It's relatively simple for me to convert the data from the excel files into a database such as Postgres and create a simple search on top of it. 对于我来说,将excel文件中的数据转换为Postgres这样的数据库并在其之上创建一个简单的搜索相对简单。

My question is how do I deal with the time it takes to do the database update each night? 我的问题是如何处理每晚进行数据库更新所需的时间? Should I create two databases and have my application alternate between them every other night? 我应该创建两个数据库,让我的应用程序隔夜更改一次吗?

What is a typical strategy for dealing with a situation like this? 处理这种情况的典型策略是什么?

My current skill set is Ruby/Rails/Postgres building and simple(ish) web apps. 我目前的技能是Ruby / Rails / Postgres构建和简单的(ish)Web应用程序。 I've been intentionally vague about technology because I'm open minded about what to use. 我一直对技术含糊不清,因为我对使用什么持开放态度。 And I'm quite happy to learn something new to solve the problem. 我很高兴学习一些新知识来解决问题。

如果您一次完成所有更新,则不需要太多的dbs-在您每次更新表时,人们看到的都是“旧”版本,在COMMIT之后不久,他们将看到所有的“新”版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM