On a table with over 10 million rows, which one of these UPDATE
statements likely to faster, or are they much of a muchness?
UPDATE table SET col1 = col2 WHERE col1 IS NULL;
UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;
Neither col1
nor col2
are keys or indexed. They are actually datetime
fields, if that changes anything.
Note: I'm fairly sure that the queries achieve the same result:
Query 1. will update col1
every time, including setting it to NULL
if col2 is NULL
.
Query 2. will update col1
only if col2
has a value, thereby leaving col1
as NULL
if col2 is NULL
.
They both change the data in the same way. One sets col1
to NULL
when col2 is NULL
, the other leaves col1
as NULL
when col2 is NULL
, but in all cases col1
was NULL
to begin with.
Performance is better measured than guessed. Yet here is my guess: It heavily depends on your usage of indeces , and their cardinality .
That means: if col2 has an index with thousands of different values and null beeing one of them, it´ll be faster than query 1. If there is no query and col2 almost always is null, it will be slower.
Best performance will be achieved if you have a two-column-index spanning both cols, and use both cols in the where-part .
It actually depends on different factors.
UPDATE table SET col1 = col2;
Will do a full table scan and modification in any case.
These queries
UPDATE table SET col1 = col2 WHERE col1 IS NULL;
UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;
will result in faster operation, if col1 and col2 are indexed.
You can get information about what is done in which case if you execute
EXPLAIN UPDATE table SET col1 = col2;
EXPLAIN UPDATE table SET col1 = col2 WHERE col1 IS NULL;
EXPLAIN UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;
In my experience the second format is usually preferable, but of course it will depend on your data. It is important to note that indexes on the columns in question may not include null values, depending on the database provider and the way that the indexes are defined. Even if indexes are defined, they may not improve performance, and depending on your statistics they may not be used at all.
In response to the various comments & answers about measuring rather than guessing, and also to the comments regarding the count of rows where col2 is NULL
vs col2
having a value of some kind, I ran 4 tests on a test table:
| Query 1 | Query 2
col2 almost all NULL | 5 min 27.96 sec | 4 min 20.38 sec
col2 almost all has values | 34 min 31.64 sec | 32 min 31.54 sec
Conclusion: Query 2 is marginally faster in both cases .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.