简体   繁体   English

在现场计算机上更新(或替换)整个数据库表的最佳方法是什么?

[英]What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. 我每周都会收到一份数据源,我将解析并放入数据库。 The data will not change much from week to week, but I should be updating the database on a regular basis. 数据每周都不会有太大变化,但我应该定期更新数据库。 Besides this weekly update, the data is static. 除了每周更新,数据是静态的。

For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. 目前重建整个数据库不是问题,但最终这个数据库将是实时的,人们可能在重建数据库时查询数据库。 The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray." 数据量不小(几百兆字节),因此它不会立即加载,而且我个人想要一个比“我希望没有人在数据库处于混乱状态时查询”的简单系统。

I've thought of a few different ways of solving this problem, and was wondering what the best method would be. 我想到了解决这个问题的几种不同方法,并想知道最好的方法是什么。 Here's my ideas so far: 到目前为止,这是我的想法:

  1. Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. 而不是替换整个表,查询我当前数据库与我想要放在数据库中的区别。 This seems like it could be an unnecessary amount of work, though. 但这似乎可能是一项不必要的工作量。

  2. Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables). 创建虚拟数据表,然后执行表重命名(或使服务器代码指向新数据表)。

  3. Just telling users that the site is going through maintenance and put the system offline for a few minutes. 只是告诉用户该网站正在进行维护并将系统脱机几分钟。 (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.) (由于显而易见的原因,这不是优选的,但如果它是最好的答案,我愿意接受它。)

Thoughts? 思考?

I can't speak for MySQL, but PostgreSQL has transactional DDL. 我不能说MySQL,但PostgreSQL有事务DDL。 This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. 这是一个很棒的功能,这意味着您的第二个选项,即将新数据加载到虚拟表中,然后执行表重命名,应该很有效。 If you want to replace the table foo with foo_new , you only have to load the new data into foo_new and run a script to do the rename. 如果要用foo_new替换表foofoo_new只需将新数据加载到foo_new并运行脚本进行重命名。 This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back. 这个脚本应该在它自己的事务中执行,所以如果有关重命名的内容变坏, foofoo_new都将保持不变。

The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo . 这种方法的主要问题是,处理关键foo其他表的外键可能会有点麻烦。 But at least you're guaranteed that your data will remain consistent. 但至少可以保证您的数据保持一致。

A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). 我认为,从长远来看,更好的方法就是直接对数据进行更新(您的第一个选择)。 Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. 再次,您可以将所有更新保留在单个事务中,因此您可以保证全有或全无语义。 Even better would be online updates, just updating the data directly as new information becomes available. 更好的是在线更新,只是在新信息可用时直接更新数据。 This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option. 如果您需要其他人的批处理作业的结果,这可能不是您的选择,但如果您可以这样做,那么这是最佳选择。

BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;

Users will see the changeover instantly when you hit commit. 当您点击提交时,用户将立即看到转换。 Any queries started before the commit will run on the old data, anything afterwards will run on the new data. 在提交之前启动的任何查询将在旧数据上运行,之后的任何内容都将在新数据上运行。 The database will actually clear the old table once the last user is done with it. 一旦最后一个用户完成它,数据库实际上将清除旧表。 Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. 因为一切都是“静态的”(你是唯一一个改变它的人,每周只有一次),你不必担心任何锁定问题或超时。 For MySQL, this depends on InnoDB. 对于MySQL,这取决于InnoDB。 PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing. PostgreSQL做到了,而SQL Server称之为“快照”,而且由于我很少使用这个东西,我不记得我头脑中的细节。

If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information. 如果您使用谷歌“事务隔离”+您正在使用的任何数据库的名称,您将找到适当的信息。

We solved this problem by using PostgreSQL's table inheritance/constraints mechanism. 我们通过使用PostgreSQL的表继承/约束机制解决了这个问题。 You create a trigger that auto-creates sub-tables partitioned based on a date field. 您可以创建一个触发器,根据日期字段自动创建分区的子表。

This article was the source I used. 这篇文章是我使用的来源。

Which database server are you using? 您使用的是哪个数据库服务器? SQL 2005 and above provides a locking method called "Snapshot". SQL 2005及更高版本提供了一种名为“Snapshot”的锁定方法。 It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. 它允许您打开事务,执行所有更新,然后提交,同时数据库的用户继续查看事务前数据。 Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case. 通常,您的事务会锁定您的表并阻止其查询,但快照锁定在您的情况下将是完美的。

More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx 更多信息: http//blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx

But it requires SQL Server, so if you're using something else.... 但它需要SQL Server,所以如果你正在使用别的东西....

Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to 几个数据库系统(因为你没有指定你的,我会保持这个通用)确实提供了SQL:2003 Standard语句MERGE ,它基本上允许你

  • insert new rows into a target table from a source which don't exist there yet 从尚未存在的源中将新行插入目标表
  • update existing rows in the target table based on new values from the source 根据源中的新值更新目标表中的现有行
  • optionally even delete rows from the target that don't show up in the import table anymore 有选择地甚至可以删除目标中不再显示在导入表中的行

SQL Server 2008 is the first Microsoft offering to have this statement - check out more here , here or here . SQL Server 2008是第一个提供此声明的Microsoft产品 - 请 此处此处查看更多信息

Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all. 其他数据库系统可能会有类似的实现 - 毕竟它是SQL:2003标准语句。

Marc

Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). 使用不同的表名(mytable_ [yyyy] _ [wk])和视图为您提供常量名称(mytable)。 Once a new table is completely imported update your view so that it uses that table. 完全导入新表后,更新视图以使其使用该表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM