這篇文章主要介紹了MySQL?Varchar?類型尾部空格,在這里需要注意的是?binary?排序規(guī)則的?pad?屬性為?NO?PAD，這里其實不是個例外，因為?char、varchar?和?text?類型都?xì)w類為?nonbinary，感興趣的朋友跟隨小編一起學(xué)習(xí)下吧

背景

近期發(fā)現(xiàn)系統(tǒng)中某個輸入框里如果輸入xxx+空格的時候會出現(xiàn)異常情況，經(jīng)過排查發(fā)現(xiàn)在調(diào)用后端接口時會有兩步操作，一是從數(shù)據(jù)庫中查詢到的數(shù)組中將與xxx+空格一致的元素剔除，二是根據(jù)xxx+空格從數(shù)據(jù)庫中查詢對應(yīng)的明細(xì)。

出現(xiàn)異常的原因是在剔除時未能剔除掉對應(yīng)的元素，也就意味著xxx+空格對應(yīng)的內(nèi)容在數(shù)據(jù)庫中不存在；但是在查詢明細(xì)時還是查詢到了，頓時感覺很費(fèi)解，也就衍生出了這篇文章后續(xù)的內(nèi)容。

原因

開發(fā)人員在處理前端傳過來的字符串時沒有執(zhí)行 trim()，所以導(dǎo)致與數(shù)組中元素匹配的時候沒有匹配到，也就沒能剔除對應(yīng)的元素，"a".equals("a ") 的結(jié)果肯定是 false 嘛。
MySQL 在查詢時會忽略掉字符串最后的空格，所以導(dǎo)致xxx+空格作為查詢條件時和xxx為同一效果。

詳解

對于第一條原因只能說是開發(fā)時疏漏，沒什么可說的，我們著重了解下第二條，為什么 MySQL 會忽略掉查詢條件最后的空格。本文基于 MySQL 8.0.28，文章中有些內(nèi)容是 MySQL 8.0 新增的，但主體也適用于 5.x 版本。

在探究之前我們需要準(zhǔn)備下使用的數(shù)據(jù)庫，畢竟實踐出來的結(jié)果才是真實的，首先我們準(zhǔn)備一個測試使用的數(shù)據(jù)庫和表，結(jié)構(gòu)如下，字符集和排序規(guī)則先選擇比較常用的 utf8mb4 和 utf8mb4_unicode_ci，之后在表里插入兩條數(shù)據(jù)：

mysql> desc test;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| id           | int         | NO   | PRI | NULL    |       |
| name_char    | char(20)    | YES  |     | NULL    |       |
| name_varchar | varchar(20) | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

INSERT INTO `test` VALUES (1, 'char1', 'varchar1');
INSERT INTO `test` VALUES (2, 'char2     ', 'varchar2     ');

char 和 varchar 的區(qū)別

首先看一下官方對于 char 類型和 varchar 類型的介紹，以下內(nèi)容摘自【11.3.2 The CHAR and VARCHAR Types】

The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

通過以上我們可以得知以下幾部分內(nèi)容：

char 類型長度為 0-255，varchar 類型長度為 0-65535，char 和 varchar 類型的長度其實還會受到內(nèi)容長度的影響，這里我們不深究。
char 類型為定長字段，存儲時會向右填充空格至聲明的長度；varchar 類型為變長字段，存儲時聲明的只是可存儲的最長內(nèi)容，實際長度與內(nèi)容有關(guān)。
在 sql mode 中未開啟 PAD_CHAR_TO_FULL_LENGTH 時，char 類型在查詢時會在忽略尾部空格（關(guān)于 sql mode 的資料請移步【5.1.11 Server SQL Modes】，這里我們不深究）

下面的查詢結(jié)果中第一行是都沒有空格的結(jié)果，第二行是都帶有 5 個空格的結(jié)果，可以看到 char 類型無論帶不帶空格都只會返回基本的字符。

mysql> select concat("(",name_char,")") name_char, concat("(",name_varchar,")") name_varchar from test;
+-----------+-----------------+
| name_char | name_varchar    |
+-----------+-----------------+
| (char1)   | (varchar1)      |
| (char2)   | (varchar2     ) |
+-----------+-----------------+
2 rows in set (0.01 sec)

第一行好理解，你存進(jìn)去的時候沒帶空格，數(shù)據(jù)庫自己填充上了空格，總不能查出來的結(jié)果還變了吧；第二行則是入庫的時候字符串最后的字符和數(shù)據(jù)庫填充的字符是同一種，查詢的時候數(shù)據(jù)庫怎么分得清是你自己填的還是它填的呢，直接一刀切。而 varchar 類型因為不會被填充，所以查詢結(jié)果中完成的保留下了尾部空格。

varchar 對于尾部空格的處理

上節(jié)了解過 char 類型查詢時會忽略尾部空格，但是在實際使用中發(fā)現(xiàn) varchar 也有類似的規(guī)則，在查看文檔時發(fā)現(xiàn)有以下一段內(nèi)容，摘自【11.3.2 The CHAR and VARCHAR Types】

Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.
MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.

根據(jù)這一段描述，我們可以得知 char、varchar 和 text 內(nèi)容的排序和比較過程受排序規(guī)則影響，在 UCA 9.0.0 之前 pad 屬性默認(rèn)為 PAD SPACE，而之后的默認(rèn)屬性為 NO PAD。

在官方文檔中可以找到以下說明，摘自【Trailing Space Handling in Comparisons】

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:

For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.
NO PAD collations treat trailing spaces as significant in comparisons, like any other character.

這一段主要描述 char、varchar 和 text 類型在比較時，如果排序規(guī)則的 pad 屬性為 PAD SPACE 則會忽略尾部空格，NO PAD 屬性則不會，而這正解釋了最初的問題。我們通過修改列的排序規(guī)則驗證以下，首先看一下當(dāng)前使用 PAD SPACE 時的查詢結(jié)果。

mysql> show full columns from test;
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field        | Type        | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| id           | int         | NULL               | NO   | PRI | NULL    |       | select,insert,update,references |         |
| name_char    | char(20)    | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| name_varchar | varchar(20) | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
3 rows in set (0.01 sec)

mysql> select * from test where name_varchar = 'varchar2';
+----+-----------+---------------+
| id | name_char | name_varchar  |
+----+-----------+---------------+
|  2 | char2     | varchar2      |
+----+-----------+---------------+
1 row in set (0.01 sec)

可以看到在 PAD SPACE 屬性下可以通過varchar2查詢到varchar2，說明比較時忽略的尾部的空格，我們將 name_varchar 的排序規(guī)則切換為 UCA 9.0.0 以后版本再來看一下結(jié)果。

mysql> ALTER TABLE test CHANGE name_varchar name_varchar VARCHAR(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show full columns from test;
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field        | Type        | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| id           | int         | NULL               | NO   | PRI | NULL    |       | select,insert,update,references |         |
| name_char    | char(20)    | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| name_varchar | varchar(20) | utf8mb4_0900_ai_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+--------------+-------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
3 rows in set (0.01 sec)

mysql> select * from test where name_varchar = 'varchar2';
Empty set (0.00 sec)

與預(yù)期一樣，切換排序規(guī)則后，尾部空格參與比較，已經(jīng)不能通過varchar2查詢到varchar2了。

確定排序規(guī)則的 pad 屬性

那接下來的問題是如何判斷當(dāng)前的排序規(guī)則是基于 UCA 9.0.0 之前還是之后的版本呢？其實在 mysql 8.x 版本中，排序規(guī)則保存在 information_schema 庫的 COLLATIONS 表中，可以通過以下語句查詢對應(yīng)的 pad 屬性值，例如我們一開始選擇的 utf8mb4_unicode_ci。

mysql> select collation_name, pad_attribute from information_schema.collations where collation_name = 'utf8mb4_unicode_ci';
+--------------------+---------------+
| collation_name     | pad_attribute |
+--------------------+---------------+
| utf8mb4_unicode_ci | PAD SPACE     |
+--------------------+---------------+
1 row in set (0.00 sec)

除了查詢數(shù)據(jù)庫以外，還可以通過排序規(guī)則的名稱進(jìn)行區(qū)別，在官方文檔中有以下一段描述，摘自【Unicode Collation Algorithm (UCA) Versions】

MySQL implements the xxx_unicode_ci collations according to the Unicode Collation Algorithm (UCA) described at http://www.unicode.org/reports/tr10/. The collation uses the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt. The xxx_unicode_ci collations have only partial support for the Unicode Collation Algorithm.

Unicode collations based on UCA versions higher than 4.0.0 include the version in the collation name. Examples:

utf8mb4_unicode_520_ci is based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt),
utf8mb4_0900_ai_ci is based on UCA 9.0.0 weight keys (http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt).

可以看出，名稱類似 xxx_unicode_ci 的排序規(guī)則是基于 UCA 4.0.0 的，而 xxx_520_ci 是基于 UCA 5.2.0，xxx_0900_ci 是基于 UCA 9.0.0 的。通過查詢數(shù)據(jù)庫驗證，排序規(guī)則中包含 0900 字樣的 pad 屬性均為 NO PAD，符合以上描述。

需要注意的是 binary 排序規(guī)則的 pad 屬性為 NO PAD，這里其實不是個例外，因為 char、varchar 和 text 類型都?xì)w類為nonbinary。

到此這篇關(guān)于MySQLVarchar類型尾部空格的文章就介紹到這了,更多相關(guān)MySQLVarchar類型尾部空格內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: