简体   繁体   English

如果列值为空,Mysql 在索引列上插入查询性能

[英]Mysql insert query performance on indexed column if the column value is null

I have a mysql table having ~50 million rows.我有一个大约 5000 万行的 mysql 表。 I want to add a secondary index on a column that may have NULL values.我想在可能具有 NULL 值的列上添加二级索引。 I want to understand if inserting a row in this table that has value of this column as NULL will still be an expensive operation?我想了解在此表中插入该列的值为 NULL 的行是否仍然是一项昂贵的操作? Or is inserting a row adds an overhead only for the indexed columns having non null values in Mysql?或者插入一行是否只为 Mysql 中具有非空值的索引列增加开销?

Let's try an experiment.让我们做一个实验。 I created a test table:我创建了一个测试表:

mysql> create table mytable (id serial primary key, x int, y int);

I filled it with a few million rows.我用几百万行填充它。 Then I tested a 1-million row INSERT of NULLs:然后我测试了 100 万行 INSERT 的 NULL:

mysql> insert into mytable (x, y) select null, null from mytable limit 1000000;
Query OK, 1000000 rows affected (2.57 sec)

And the same with non-NULL values:与非 NULL 值相同:

mysql> insert into mytable (x, y) select 1234, 1234 from mytable limit 1000000;
Query OK, 1000000 rows affected (2.60 sec)

Now add an index and try the test again:现在添加一个索引并再次尝试测试:

mysql> alter table mytable add index (x);

mysql> insert into mytable (x, y) select null, null from mytable limit 1000000;
Query OK, 1000000 rows affected (3.12 sec)

mysql> insert into mytable (x, y) select 1234, 1234 from mytable limit 1000000;
Query OK, 1000000 rows affected (3.21 sec)

Now I add an index on the last column, so there are two index writes instead of just one, and try the test again:现在我在最后一列添加一个索引,所以有两个索引写入而不是一个,然后再次尝试测试:

mysql> alter table mytable add index (y);

mysql> insert into mytable (x, y) select null, null from mytable limit 1000000;
Query OK, 1000000 rows affected (3.64 sec)

mysql> insert into mytable (x, y) select 1234, 1234 from mytable limit 1000000;
Query OK, 1000000 rows affected (3.82 sec)

I know this test is flawed.我知道这个测试是有缺陷的。 I'm lazy and I'm not reinitializing the table to its initial size before each test.我很懒,我不会在每次测试之前将表重新初始化为其初始大小。 So the table is getting larger and larger, and that's probably accounting for the increase in time of each test.所以桌子越来越大,这可能是每次测试时间增加的原因。

The point is not to prove the answer one way or the other.重点不是以一种或另一种方式证明答案。 It's to show that if you have a question like this, you have the opportunity and the responsibility to test it yourself.这是为了表明,如果你有这样的问题,你有机会也有责任自己去测试。 That's probably going to give better results than asking on Stack Overflow, for several reasons:由于以下几个原因,这可能会比在 Stack Overflow 上提出更好的结果:

  • You don't have to wait for someone to answer, if anyone ever does.如果有人回答,您不必等待有人回答。

  • You avoid spurious answers from people who don't actually know.你会避免那些实际上并不知道的人的虚假回答。

  • You avoid answers based on flawed methodology, like the one I showed above.你会避免基于有缺陷的方法的答案,就像我上面展示的那样。

There's a reason your education was in Computer Science.你的教育是计算机科学是有原因的。 You should embrace your role as a scientist, and think of what kind of experiment could give you the answer (with proper methodology).您应该接受自己作为科学家的角色,并思考什么样的实验可以给您答案(使用适当的方法)。

Think of NULL as just another value.NULL视为另一个值。

Think of an INDEX as a list of pairs -- the key value and some kind of pointer to the row.INDEX视为一对列表——键值和某种指向行的指针。 (The key value may be NULL .) Also, think of the INDEX being just like a table -- stored in a BTree. (键值可能是NULL 。)另外,认为INDEX就像一个表——存储在 BTree 中。 This is ordered by the key, just as the data is ordered by the PRIMARY KEY in its BTree.这是按键排序的,就像数据按其 BTree 中的PRIMARY KEY排序一样。

Adding a row to the table adds a row to the data's BTree and to each secondary INDEX's BTree.向表中添加一行会向数据的 BTree 和每个辅助INDEX's BTree 添加一行。

By thinking of NULL as just another value, you can reasonably guess that the various operations don't treat NULL as different.通过将NULL视为另一个值,您可以合理地猜测各种操作不会将 NULL 视为不同。

So, if the business logic needs NULL , use it without worrying.因此,如果业务逻辑需要NULL ,请放心使用。

There are usage issues with NULL . NULL存在使用问题。 WHERE x = NULL should probably "wrong" and should be WHERE x IS NULL . WHERE x = NULL应该可能是“错误的”并且应该是WHERE x IS NULL NULL is not equal anything, including another NULL . NULL不等于任何东西,包括另一个NULL And other issues where NULL is not fully "just another value".以及NULL不完全“只是另一个值”的其他问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM