MySql?字符集不同導(dǎo)致?left?join?慢查詢的問(wèn)題解決
在 MySql 建表時(shí)候一般會(huì)指定字符集,大多數(shù)情況下為了更好的兼容性無(wú)腦選了 utf8mb4。但是有時(shí)會(huì)因?yàn)檫x錯(cuò),或歷史遺留問(wèn)題,導(dǎo)致使用了 utf8 字符集。當(dāng)兩個(gè)表的字符集不一樣,在使用字符型字段進(jìn)行表連接查詢時(shí),就需要特別注意下查詢耗時(shí)是否符合預(yù)期。
有次使用 left join 寫(xiě)一個(gè) SQL,發(fā)現(xiàn)用時(shí)明顯超過(guò)預(yù)期,經(jīng)過(guò)一頓折騰才發(fā)現(xiàn)是兩個(gè)表字符集不一樣,特此記錄一下。
問(wèn)題分析
mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ; +-----------+ | COUNT( *) | +-----------+ | 13447 | +-----------+ 1 row in set (0.89 sec)
例如上面的 SQL,左表 1W 條數(shù)據(jù),右表 400 多條數(shù)據(jù),在 host_sn 字段上都有索引,查詢竟然用了近 900ms,怎么會(huì)這么慢?
mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ; +----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+ | 1 | SIMPLE | t | NULL | index | NULL | idx_host_sn | 122 | NULL | 10791 | 100.00 | Using index | | 1 | SIMPLE | p | NULL | index | NULL | idx_host_sn | 152 | NULL | 457 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+-------+---------------+-------------+---------+------+-------+----------+-----------------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
查看下執(zhí)行計(jì)劃,的確是使用了索引,但是細(xì)看 Extra 列發(fā)現(xiàn)較正常的連表查詢多了“Using join buffer (Block Nested Loop)”這一信息,這個(gè)具體是什么意思我們后面再說(shuō)。
然后我們?cè)倏聪略敿?xì)的執(zhí)行計(jì)劃,使用 explain formart=json。
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "988640.52" }, "nested_loop": [ { "table": { "table_name": "t", "access_type": "index", "key": "idx_host_sn", "used_key_parts": [ "host_sn" ], "key_length": "122", "rows_examined_per_scan": 10791, "rows_produced_per_join": 10791, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "161.00", "eval_cost": "2158.20", "prefix_cost": "2319.20", "data_read_per_join": "2M" }, "used_columns": [ "host_sn" ] } }, { "table": { "table_name": "p", "access_type": "index", "key": "idx_host_sn", "used_key_parts": [ "host_sn" ], "key_length": "152", "rows_examined_per_scan": 457, "rows_produced_per_join": 4931487, "filtered": "100.00", "using_index": true, "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "23.92", "eval_cost": "986297.40", "prefix_cost": "988640.52", "data_read_per_join": "865M" }, "used_columns": [ "host_sn" ], "attached_condition": "<if>(is_not_null_compl(p), (`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4)), true)" } } ] } }
特別需要關(guān)注的是這一對(duì) KV
"attached_condition": "<if>(is_not_null_compl(p), (`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4)), true)"
看字面意思就是當(dāng) p 表不為空的時(shí)候,執(zhí)行表連接需要先將 p 表的 host_sn 字段轉(zhuǎn)變?yōu)?utf8mb4 字符集。我們應(yīng)該都知道在表連接中使用了函數(shù)的話,是無(wú)法使用索引的。
所以再回到上面我看到的“Using join buffer (Block Nested Loop)”問(wèn)題,來(lái)解釋下這是一個(gè)什么過(guò)程。
Nested-Loop Join
MySql 官網(wǎng)對(duì) Nested-Loop Join 有做過(guò)解釋,其實(shí)做開(kāi)發(fā)的同學(xué)看到名字就大體知道是啥,不就是循環(huán)嵌套嘛。
MySql 中分為 Nested-Loop Join 算法跟 Block Nested-Loop Join 算法。
例如,有如下三個(gè)表,t1、t2、t3 使用了這三種 join type。
Table Join Type
t1 range
t2 ref
t3 ALL
當(dāng)使用 Nested-Loop Join 算法時(shí),其 join 過(guò)程如下所示,其實(shí)就是簡(jiǎn)單的三層循環(huán)。
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } }
Block Nested-Loop Join(BNL) 算法是對(duì) Nested-Loop Join 算法的一種優(yōu)化。BNL 算法緩沖外部循環(huán)中讀取的行來(lái)減少內(nèi)部循環(huán)中讀取表的次數(shù)。例如,將 10 行數(shù)據(jù)讀取到緩沖器中,并且將緩沖器傳遞到下一個(gè)循環(huán)內(nèi)部,內(nèi)部循環(huán)中讀取的每一行與緩沖器中的所有 10 行進(jìn)行比較。這將使讀取內(nèi)部表的次數(shù)減少一個(gè)數(shù)量級(jí)。
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty join buffer } } } if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } }
算法實(shí)現(xiàn)如上,只有當(dāng) “join buffer” 滿的時(shí)候才會(huì)觸發(fā) t3 表的讀取,如果 “join buffer” 的 size = 10 那么就可以減少 10 倍的 t3 表被讀取次數(shù),從內(nèi)存中讀取數(shù)據(jù)的效率顯然要比從磁盤(pán)讀取的效率高的多。從而提升 join 的效率。
但其實(shí)再好的優(yōu)化畢竟也是嵌套循環(huán),做開(kāi)發(fā)的同學(xué)應(yīng)該都知道 O(N²) 的時(shí)間復(fù)雜度是無(wú)法接受的。這也是我們這個(gè)查詢這么慢的根因。
解決辦法
解決辦法其實(shí)很簡(jiǎn)單,修改右表的字符集就可以解決。
在變更數(shù)據(jù)集之前我們先用 show table status 查看下當(dāng)前表的狀態(tài)。
mysql> show table status like 'app_config_control_sn'; +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+ | app_config_control_sn | InnoDB | 10 | Dynamic | 457 | 143 | 65536 | 0 | 32768 | 0 | 1041 | 2023-04-17 03:25:45 | 2023-04-17 03:27:24 | NULL | utf8_general_ci | NULL | | SN | +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+ 1 row in set (0.00 sec)
接著使用如下命令變更表的字符集。
mysql> ALTER TABLE app_config_control_sn CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; Query OK, 457 rows affected (0.09 sec) Records: 457 Duplicates: 0 Warnings: 0
再次使用 show table status 命令查看下表的狀態(tài)。
mysql> show table status like 'app_config_control_sn'; +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+ | app_config_control_sn | InnoDB | 10 | Dynamic | 457 | 143 | 65536 | 0 | 32768 | 0 | 1041 | 2023-04-17 03:50:11 | 2023-04-17 03:50:11 | NULL | utf8mb4_general_ci | NULL | | SN | +-----------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+ 1 row in set (0.01 sec)
可以看到表的字符集已經(jīng)發(fā)生了變化,那我們?cè)俅螆?zhí)行開(kāi)始的 SQL 及 explain 語(yǔ)句,確認(rèn)下問(wèn)題是否已經(jīng)解決。
mysql> SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ; +-----------+ | COUNT( *) | +-----------+ | 13447 | +-----------+ 1 row in set (0.03 sec) mysql> explain SELECT COUNT( *) from app_bind_rel t left join app_config_control_sn p on t.host_sn = p.host_sn ; +----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+ | 1 | SIMPLE | t | NULL | index | NULL | idx_host_sn | 122 | NULL | 10791 | 100.00 | Using index | | 1 | SIMPLE | p | NULL | ref | idx_host_sn | idx_host_sn | 202 | db0.t.host_sn | 2 | 100.00 | Using where; Using index | +----+-------------+-------+------------+-------+---------------+-------------+---------+---------------+-------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec)
可以看到耗時(shí)已經(jīng)只需要 30ms 左右,這個(gè)就比較符合預(yù)期,而在執(zhí)行計(jì)劃中也不再會(huì)有“Using join buffer (Block Nested Loop)”信息。
其他
mysql> SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ; +-----------+ | COUNT( *) | +-----------+ | 730 | +-----------+ 1 row in set (0.01 sec)
在沒(méi)有變更字符集之前,當(dāng)我們將 left join 修改為 join 的時(shí)候會(huì)發(fā)現(xiàn)耗時(shí)減少了 100 倍,只用了 10 ms,這是為什么呢?
mysql> explain SELECT COUNT( *) from app_bind_rel t join app_config_control_sn p on t.host_sn = p.host_sn ; +----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | p | NULL | index | NULL | idx_host_sn | 152 | NULL | 457 | 100.00 | Using index | | 1 | SIMPLE | t | NULL | ref | idx_host_sn | idx_host_sn | 122 | func | 1 | 100.00 | Using where; Using index | +----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec)
查看執(zhí)行計(jì)劃,發(fā)現(xiàn)使用 join 的時(shí)候不會(huì)有 “Using join buffer (Block Nested Loop)”。再細(xì)看執(zhí)行計(jì)劃,發(fā)現(xiàn)驅(qū)動(dòng)表已經(jīng)由 t 表變?yōu)榱?p 表。
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "643.80" }, "nested_loop": [ { "table": { "table_name": "p", "access_type": "index", "key": "idx_host_sn", "used_key_parts": [ "host_sn" ], "key_length": "152", "rows_examined_per_scan": 457, "rows_produced_per_join": 457, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "4.00", "eval_cost": "91.40", "prefix_cost": "95.40", "data_read_per_join": "82K" }, "used_columns": [ "host_sn" ] } }, { "table": { "table_name": "t", "access_type": "ref", "possible_keys": [ "idx_host_sn" ], "key": "idx_host_sn", "used_key_parts": [ "host_sn" ], "key_length": "122", "ref": [ "func" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 457, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "457.00", "eval_cost": "91.40", "prefix_cost": "643.80", "data_read_per_join": "117K" }, "used_columns": [ "host_sn" ], "attached_condition": "(`db0`.`t`.`host_sn` = convert(`db0`.`p`.`host_sn` using utf8mb4))" } } ] } }
查看詳細(xì)的執(zhí)行計(jì)劃,可以看到
"attached_condition": "(`collection_bullet_0000`.`t`.`host_sn` = convert(`collection_bullet_0000`.`p`.`host_sn` using utf8mb4))"
這對(duì) KV 依然是存在的,但是 "using_join_buffer": "Block Nested Loop" 已經(jīng)不存在了。這個(gè)其實(shí)主要是因?yàn)楫?dāng) p 表變?yōu)轵?qū)動(dòng)表的時(shí)候,會(huì)先將自己的 host_sn 字段轉(zhuǎn)為 utf8mb4 字符集,再與 t 表進(jìn)行關(guān)聯(lián)。t 表由于本來(lái)就是 utf8mb4 字符集且存在索引,就可以正常走數(shù)據(jù)庫(kù)索引了,所以查詢耗時(shí)也就大大降低。而使用 left join 時(shí)候,t 表作為驅(qū)動(dòng)表是無(wú)法優(yōu)化改變的。
可見(jiàn)在表連接中即使使用了函數(shù)也不一定就沒(méi)法走索引,關(guān)鍵還是要看用法及明確處理過(guò)程。
記得剛學(xué)習(xí)數(shù)據(jù)庫(kù)的時(shí)候,老師還特別強(qiáng)調(diào)驅(qū)動(dòng)表一定要寫(xiě)在左邊,而隨著數(shù)據(jù)庫(kù)技術(shù)的不斷迭代發(fā)展,數(shù)據(jù)庫(kù)已經(jīng)能更智能的自動(dòng)幫我們優(yōu)化處理過(guò)程,之前很多的數(shù)據(jù)庫(kù)規(guī)則也不需要了。
到此這篇關(guān)于MySql 字符集不同導(dǎo)致 left join 慢查詢的問(wèn)題解決的文章就介紹到這了,更多相關(guān)MySql left join 慢查詢內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
MySQL閃回(flashback)原理與實(shí)戰(zhàn)
摘要: 本文將介紹閃回原理,給出筆者的實(shí)戰(zhàn)經(jīng)驗(yàn),并對(duì)現(xiàn)存的閃回工具作比較。這是目前為止,MySQL 閃回領(lǐng)域第一篇全面深入的實(shí)戰(zhàn)總結(jié)。2017-01-01如何使用myisamchk和mysqlcheck工具快速修復(fù)損壞的MySQL數(shù)據(jù)庫(kù)文件
有時(shí)候數(shù)據(jù)庫(kù)突然就壞了很郁悶,用mysqlcheck.exe可以修復(fù)受損數(shù)據(jù)庫(kù)2020-01-01深入mysql存儲(chǔ)過(guò)程中表名使用參數(shù)傳入的詳解
本篇文章是對(duì)mysql存儲(chǔ)過(guò)程中表名使用參數(shù)傳入進(jìn)行了詳細(xì)的分析介紹,需要的朋友參考下2013-06-06解決mysql創(chuàng)建數(shù)據(jù)庫(kù)后出現(xiàn):Access denied for user ''root''@''%'' to dat
這篇文章主要給大家介紹了如何解決mysql在創(chuàng)建數(shù)據(jù)庫(kù)后出現(xiàn):Access denied for user 'root'@'%' to database 'xxx'的錯(cuò)誤提示,文中介紹的非常詳細(xì),需要的朋友可以參考借鑒,下面來(lái)一起看看吧。2017-05-05