In MySQL, how do I insert only when row doesn't exist and update only when existing version is less

Question

I am looking for a way to only insert when the row does not exist in MySQL, and update when the row exists AND the version of the existing row is less than (or equal to) the version of the new row.

For example, the table is defined as:

CREATE TABLE documents (
  id VARCHAR(64) NOT NULL,
  version BIGINT UNSIGNED NOT NULL,
  data BLOB,
  PRIMARY KEY (id)
);

And contains the following data:

id  version  data
----------------------------
1   3        first data set
2   2        second data set
3   5        third data set

And I want to merge the following table (UPDATE: id column is unique):

id  version  data
----------------------------
1   4        updated 1st
3   3        updated 2nd
4   1        new 4th

And it should produce the following (UPDATE: see how only 1 is updated and 4 is inserted):

id  version  data
----------------------------
1   4        updated 1st
2   2        second data set
3   5        third data set
4   1        new 4th

I've looked at INSERT ... ON DUPLICATE KEY UPDATE ... statement, but it doesn't allow for some sort of WHERE clause. Also, I can't really use REPLACE because it also does not allow WHERE. Is this even possible with a single MySQL statement?

I am using Java and am trying to possible insert/update many records using the PreparedStatement with batching (addBatch). Any help would be appreciated.

UPDATE: Is there any way to use this query with the PreparedStatement in Java? I have a List of Document objects with id, version, and data.

Answer 1

EDIT: In my earlier answer I suggested that a unique constraint is needed on (id, version) , but actually this is not necessary. Your unique constraint on id only is enough for the solution to work.

You should be able to use the REPLACE command as follows:

REPLACE INTO main 
SELECT  IFNULL(m.id, s.id) id, 
        IFNULL(m.version, s.version) version, 
        IFNULL(m.data, s.data) data
FROM       secondary s
LEFT JOIN  main m ON (m.id = s.id AND m.version > s.version);

Test case:

CREATE TABLE main ( 
   id int, 
   version int, 
   data varchar(50), 
   PRIMARY KEY (id)
);

CREATE TABLE secondary (id int, version int, data varchar(50));

INSERT INTO main VALUES (1, 3, 'first data set');
INSERT INTO main VALUES (2, 2, 'second data set');
INSERT INTO main VALUES (3, 5, 'third data set');

INSERT INTO secondary VALUES (1, 4, 'updated 1st');
INSERT INTO secondary VALUES (3, 3, 'udated 2nd');
INSERT INTO secondary VALUES (4, 1, 'new 4th');

Result:

SELECT * FROM main;
+----+---------+-----------------+
| id | version | data            |
+----+---------+-----------------+
|  1 |       4 | updated 1st     |
|  2 |       2 | second data set |
|  3 |       5 | third data set  |
|  4 |       1 | new 4th         |
+----+---------+-----------------+
4 rows in set (0.00 sec)

As a side-note, to help you understand what's happening in that REPLACE command, note the following:

SELECT     s.id s_id, s.version s_version, s.data s_data,
           m.id m_id, m.version m_version, m.data m_data
FROM       secondary s
LEFT JOIN  main m ON (m.id = s.id AND m.version > s.version);

+------+-----------+-------------+------+-----------+----------------+
| s_id | s_version | s_data      | m_id | m_version | m_data         |
+------+-----------+-------------+------+-----------+----------------+
|    1 |         4 | updated 1st | NULL |      NULL | NULL           |
|    3 |         3 | udated 2nd  |    3 |         5 | third data set |
|    4 |         1 | new 4th     | NULL |      NULL | NULL           |
+------+-----------+-------------+------+-----------+----------------+
3 rows in set (0.00 sec)

Then the IFNULL() functions were taking care of "overwriting" the latest version from the main table if one was present, as in the case of id=3, version=5. Therefore the following:

SELECT  IFNULL(m.id, s.id) id, 
        IFNULL(m.version, s.version) version, 
        IFNULL(m.data, s.data) data
FROM       secondary s
LEFT JOIN  main m ON (m.id = s.id AND m.version > s.version);

+------+---------+----------------+
| id   | version | data           |
+------+---------+----------------+
|    1 |       4 | updated 1st    |
|    3 |       5 | third data set |
|    4 |       1 | new 4th        |
+------+---------+----------------+
3 rows in set (0.00 sec)

The above result set contains only records from the secondary table, but if any of these records happen to have a newer version in the main table, then the row is overwritten by the data from the main table. This is the input that we are feeding the REPLACE command.

Answer 2

I think INSERT ON DUPLICATE KEY UPDATE is your best bet. You can use it like

INSERT INTO table1 SELECT * FROM table2 ON DUPLICATE KEY UPDATE table1.data=IF(table1.version > table2.version, table1.data, table2.data), table1.version=IF(table1.version > table2.version, table1.version, table2.version)

Untested syntax, but I belive the idea should work.

In MySQL, how do I insert only when row doesn't exist and update only when existing version is less

Question

2 answers

solution1
3 2010-09-14 01:59:09

solution2
2 ACCPTED 2010-09-14 01:37:13

In MySQL, how do I insert only when row doesn't exist and update only when existing version is less

Question

2 answers

solution1 3 2010-09-14 01:59:09

solution2 2 ACCPTED 2010-09-14 01:37:13

solution1
3 2010-09-14 01:59:09

solution2
2 ACCPTED 2010-09-14 01:37:13