詳解MySQL的字段默認null對唯一索引的影響

更新時間：2022年09月23日 17:13:57 作者：我是大明哥

這篇文章主要為大家介紹了MySQL的字段默認null對唯一索引的影響詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進步，早日升職加薪

正文

在日常業(yè)務開發(fā)中，會經(jīng)常遇到需要保證唯一性的數(shù)據(jù)業(yè)務，如用戶注冊業(yè)務。一般注冊業(yè)務中允許用戶以手機號或email注冊賬號，且需要保證唯一，不允許重復注冊。當用戶輸入手機號或email登錄時，程序會判定輸入信息的存在與否性，存在則走登錄，不存在則走注冊。而保證唯一性就不僅僅需要在程序端做判斷，還需要MySQL的唯一索引去做最后一道防線。那么唯一索引在一些業(yè)務中使用，如果唯一索引字段中默認值設置為了null，會造成什么后果呢？

在阿里的《阿里巴巴Java開發(fā)手冊》中關于MySQL-索引規(guī)范中寫道：【強制】業(yè)務上具有唯一特性的字段，即使是多個字段的組合，也必須創(chuàng)建唯一索引。

說明：

不要以為唯一索引影響了insert速度，這個速度的損耗可以忽略不計，但提高查找的速度是明顯的；

另外，即使在應用層做了非常完善的校驗控制，只要沒有唯一索引，根據(jù)墨菲定律，必然有臟數(shù)據(jù)產(chǎn)生。

看一下為何唯一索引為影響insert速度

在MySQL中，唯一索引樹是一個非聚簇索引，每次插入數(shù)據(jù)時，都會在唯一索引樹上進行遍歷查找該插入值是否唯一，這也就是為什么會影響insert的速度，因為多一步遍歷判斷唯一性。

MySQL版本：在docker中啟動一個mysql

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.18    |
+-----------+
1 row in set (0.00 sec)

假設只存在郵箱注冊：

#建表語句
CREATE TABLE `user_1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主鍵',
  `email` varchar(32) NOT NULL DEFAULT '' COMMENT '郵箱',
  `name` varchar(11) DEFAULT '' COMMENT '名字',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk-email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert數(shù)據(jù)

#第一次插入：
insert into user(email,name) values('aaa@qq.com','aaa');
Affected rows: 1, Time: 0.003000s
#再次插入同樣的數(shù)據(jù)：
insert into user(email,name) values('aaa@qq.com','aaa');
1062 - Duplicate entry 'aaa@qq.com' for key 'uk-email', Time: 0.005000s

此時對于唯一性來說是沒問題的，可以保證業(yè)務的email的唯一性。假設隨著業(yè)務的發(fā)展，此時需要增加手機號注冊功能，那么表中就需要增加手機號字段，且需要保證手機號和郵箱的關聯(lián)唯一性。

#建表語句，注意此時phone字段的默認值為null
CREATE TABLE `user_2` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主鍵',
  `email` varchar(32) NOT NULL DEFAULT '' COMMENT '郵箱',
  `phone` char(11) DEFAULT NULL COMMENT '手機號',
  `name` varchar(11) DEFAULT '' COMMENT '名字',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk-email-phone` (`email`,`phone`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert數(shù)據(jù)

insert into user_2(email,name) values('aaa@qq.com','aaa');
Affected rows: 1, Time: 0.003000s
insert into user_2(email,name) values('aaa@qq.com','aaa');
Affected rows: 1, Time: 0.003000s
insert into user_2(email,name) values('aaa@qq.com','aaa');
Affected rows: 1, Time: 0.003000s
insert into user_2(email,phone,name) values('bbb@qq.com','13333333333','bbb');
Affected rows: 1, Time: 0.003000s
insert into user_2(email,phone,name) values('bbb@qq.com','13333333333','bbb');
1062 - Duplicate entry 'bbb@qq.com-13333333333' for key 'uk-email-phone', Time: 0.002000s

此時會發(fā)現(xiàn)，不帶phone值得前三條數(shù)據(jù)都能插入成功，帶上郵箱和手機號的值卻能正常判斷唯一性

mysql> select * from user_2;
+----+------------+-------------+------+
| id | email      | phone       | name |
+----+------------+-------------+------+
|  1 | aaa@qq.com | NULL        | aaa  |
|  2 | aaa@qq.com | NULL        | aaa  |
|  3 | aaa@qq.com | NULL        | aaa  |
|  4 | bbb@qq.com | 13333333333 | bbb  |
+----+------------+-------------+------+
4 rows in set (0.00 sec)

這時就需要牽扯到MySQL的唯一索引機制了：在MySQL官方文檔中MySQL索引文檔，描述到：

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. If you specify a prefix value for a column in a UNIQUE index, the column values must be unique within the prefix length. A UNIQUE index permits multiple NULL values for columns that can contain NULL.

解釋一下：唯一索引創(chuàng)建一個約束，使得索引中的所有值都必須是不同的。如果嘗試添加一個鍵值與現(xiàn)有行匹配的新行，則會發(fā)生錯誤。如果在唯一索引中為列指定前綴值，則列值在前綴長度內必須是唯一的。唯一索引允許包含空值的列有多個空值。

先看下explain執(zhí)行計劃：

mysql> explain select * from user_2 where email='aaa@qq.com' and phone is NULL;
+----+-------------+--------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+
| id | select_type | table  | partitions | type | possible_keys  | key            | key_len | ref         | rows | filtered | Extra                 |
+----+-------------+--------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+
|  1 | SIMPLE      | user_2 | NULL       | ref  | uk-email-phone | uk-email-phone | 132     | const,const |    3 |   100.00 | Using index condition |
+----+-------------+--------+------------+------+----------------+----------------+---------+-------------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)
mysql>
mysql> explain select * from user_2 where email='bbb@qq.com' and phone='13333333333';
+----+-------------+--------+------------+-------+----------------+----------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type  | possible_keys  | key            | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+-------+----------------+----------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | user_2 | NULL       | const | uk-email-phone | uk-email-phone | 132     | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+-------+----------------+----------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

有沒有發(fā)現(xiàn)一個有趣的現(xiàn)象，雖然兩個sql語句都使用到了uk-email-phone唯一索引，但是第一條sql的type為ref 第二條sql的type為const 我們知道，explain執(zhí)行計劃中，const一般是主鍵查詢或者唯一索引查詢是才會出現(xiàn)，而ref一般是使用普通索引時出現(xiàn)。所以，可以得出結論，MySQL在底層對唯一索引的null值做了特殊處理。

我們通過查看源碼文件的1863行，有這么個注釋：

Scans a unique non-clustered index at a given index entry to determine whether a uniqueness violation has occurred for the key value of the entry. Set shared locks on possible duplicate records

意思是掃描給定索引項處的唯一非聚集索引以確定條目的鍵值是否發(fā)生唯一性沖突。對可能重復的記錄設置共享鎖。

也就是說row_ins_scan_sec_index_for_duplicate()該方法就是處理唯一索引的，繼續(xù)往下看，在1892行，有一串注釋：

If the secondary index is unique, but one of the fields in the n_unique first fields is NULL, a unique key violation cannot occur, since we define NULL != NULL in this case

如果二級索引是唯一的，但是唯一索引的字段存在NULL，則不會發(fā)生唯一性沖突，在此代碼中定義了NULL != NULL

繼續(xù)往下走，在1996行，走到了row_ins_dupl_error_with_rec()函數(shù)，該函數(shù)在1825行。在該函數(shù)中有以下代碼：

/* In a unique secondary index we allow equal key values if they
  contain SQL NULLs 
   在唯一的二級索引中，如果包含sql NULL值
*/
  if (!index->is_clustered() && !index->nulls_equal) {
    for (i = 0; i < n_unique; i++) {
      if (dfield_is_null(dtuple_get_nth_field(entry, i))) {
        return (FALSE);
      }
    }
  }

也就是說，在唯一索引中字段為NULL的情況下，返回false，沒有拋出DB_DUPLICATE_KEY異常.