简体   繁体   English

更新db2数据库行的有效方法

[英]Efficient way to update db2 database rows

I have a table with 92 million rows. 我有一个9200万行的表。 I have a list of 4000 IDs from that table which need data updating. 我从该表中获取了4000个ID的列表,这些ID需要进行数据更新。 I put the 4000 IDs into their own table and tried running the following: 我将4000个ID放入自己的表中,并尝试运行以下命令:

update clients
set col1='1', col2='y'
where id in
(select id from idstoupdate)

But this falls over due to memory constraints. 但这归因于内存限制。 So I tried splitting the 4000 IDs into 4 table each with 1000, and its still falling over if I try it on those smaller tables. 因此,我尝试将4000个ID分成4个表(每个表有1000个),如果在较小的表上尝试,则ID仍然会丢失。 Whats the most efficient way to deal with such a large table? 处理这么大桌子的最有效方法是什么?

Thanks. 谢谢。

While there may be ways to update a table with another table through fancy subselects, I believe the best approach is to write a program to do this using the SQL API (whether it is DBI with the DBD::DB2 driver for perl, JDBC for Java, or the C libraries, etc) to perform the SELECT, FETCH each result row from the RESULTSET with a cursor, and do an update. 尽管可能存在通过花式子选择来更新另一个表的方法,但我认为最好的方法是编写一个程序以使用SQL API来执行此操作(无论是DBI,DBL :: DB2驱动程序用于perl,JDBC用于Java或C库等)来执行SELECT,使用游标从RESULTSET中获取每个结果行并进行更新。

PSEUDOCODE (I don't know what language you are familiar with): 伪代码(我不知道您使用哪种语言):

dbHandle = sqllib->open_connection(database, user, password) 
select_statement = dbHandle->prepare('SELECT id FROM idstoupdate')
update_statement = dbHandle->prepare('UPDATE clients SET col1=?, col2=? WHERE ID=?')
resultset = statement->execute(select_statement)

foreach (row in resultset) {
  id = row.getColumn('id')
  update_statement->execute('1','2',id) 
}

dbHandle->disconnect();  

You would want to add error checking. 您将要添加错误检查。 If you want either all the updates to apply, or none, then you have to look into beginning a transaction and commiting the entire transaction if you have no errors. 如果您希望应用所有更新,或者不应用任何更新,那么如果没有错误,则必须考虑开始事务并提交整个事务。 There is a wealth of material on how to do all the above in the DB2 Infocenter . DB2 Infocenter中,关于如何完成上述所有工作的材料很多。

Note: If your source data for the idstoupdate is a file, then you could skip the select statement and the work you do to load the idstoupdate table, and just read from the file and update the database. 注意:如果idstoupdate源数据是文件,则可以跳过select语句和加载idstoupdate表的工作,而只是从文件中读取并更新数据库。 This would be the most efficient way to handle updates to a table. 这将是处理表更新的最有效方法。

If you simply must update a table from another table with pure SQL, then the most common examples are in this format: 如果只需使用纯SQL从另一个表更新一个表,则最常见的示例采用以下格式:

      UPDATE table1 t1
         SET (t1.field1, t1.field2) = 
             (
               SELECT t2.field1, 
                      t2.field2
                 FROM table2 t2
                WHERE t1.joinfield = t2.joinfield 
                  AND t2.criteriafield = 'qualifier'
             )
       WHERE EXISTS 
             ( 
               SELECT 1 
                 FROM table2
                WHERE t1.joinfield = table2.joinfield 
                  AND t2.criteriafield = 'qualifier'
             )    

which eliminates the IN predicate, but is probably not much more memory or logspace efficient, and because it is less straightforward than a select, loop, fetch, update you need to be sure you have all the criteria correct. 这消除了IN谓词,但可能并没有太多的内存或日志空间效率,并且由于它不如select,loop,fetch,update那样简单,因此需要确保所有标准正确。 Your case is a bit simpler-- I think this would work, but I'd need a db2 instance to try it against: 您的情况有点简单-我认为这可以工作,但是我需要一个db2实例来进行尝试:

      UPDATE clients t1
         SET t1.col1 = '1', col2 = 'y' 
       WHERE EXISTS 
             ( 
               SELECT 1 
                 FROM idstoupdate t2
                WHERE t1.id = t2.id 
             )  

Edit: I am actually surprised that the query you gave did not work from SQL squirrel as it is a legitimate query. 编辑:我真的很惊讶您给的查询不能从SQL松鼠工作,因为它是合法查询。 It may even perform similar to the example I gave, as DB2 is very good at optimizing SQL/determining the best access path. 它甚至可以执行与我给出的示例类似的操作,因为DB2非常擅长优化SQL /确定最佳访问路径。

In my answer, I was trying to show the most memory efficient way to update a table, as well as the general pattern for updating rows in one table from another table using pure SQL (which would cover cases where one table contains more than just the rows you want to update). 在我的回答中,我试图显示最节省内存的方法来更新表,以及使用纯SQL从一个表更新另一个表中的行的一般模式(这将涵盖其中一个表包含多个表的情况)。您要更新的行)。

Additionally, I am suspicious of IN predicates that contain more than 20 or so values, even if modern database engines handle them with ease. 另外,我怀疑包含超过20个左右值的IN谓词,即使现代数据库引擎轻松处理它们也是如此。

However, the best way to examine if the database engine is handling your query efficiently and/or compare two SQL queries is to use the SQL explain commands. 但是,检查数据库引擎是否有效处理查询和/或比较两个SQL查询的最佳方法是使用SQL说明命令。

The query I posted actually works fine. 我发布的查询实际上工作正常。 My issue was being caused by using an external program to query the database, as oppose to querying the database using direct input. 我的问题是由于使用外部程序查询数据库引起的,与使用直接输入查询数据库相反。 Sorry for the wrongly informed question. 对不起,错误地告知您问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM