MySQL order by實(shí)現(xiàn)原理分析和Filesort優(yōu)化方式

更新時(shí)間：2023年12月01日 16:57:04 作者：lijingkuan

這篇文章主要介紹了MySQL order by實(shí)現(xiàn)原理分析和Filesort優(yōu)化方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

1.利用有序索引獲取有序數(shù)據(jù)

取出滿足過濾條件、作為排序條件的字段，以及可以直接定位到行數(shù)據(jù)的行指針信息，在 Sort Buffer 中進(jìn)行實(shí)際的排序操作，然后利用排好序的數(shù)據(jù)根據(jù)行指針信息返回表中取得客戶端請(qǐng)求的其他字段的數(shù)據(jù)，再返回給客戶端。

這種方式，在使用explain分析查詢的時(shí)候，顯示Using index。而文件排序顯示Using filesort。

注意：

MySQL在查詢時(shí)最多只能使用一個(gè)索引。

因此，如果WHERE條件已經(jīng)占用了索引，那么在排序中就不使用索引了。（現(xiàn)在已經(jīng)可以使用多個(gè)索引了）

1.1 按照索引對(duì)結(jié)果進(jìn)行排序：order by 使用索引時(shí)有條件

1) 返回選擇的字段，即只包括在有選擇的此列上（select后面的字段），不一定適應(yīng)*的情況)：

CREATE TABLE `test` (  
  `id` int(11) NOT NULL AUTO_INCREMENT,  
  `rdate` datetime NOT NULL,  
  `inventid` int(11) NOT NULL,  
  `customerid` int(11) NOT NULL,  
  `staffid` int(11) NOT NULL,  
  `data` varchar(20) NOT NULL,  
  PRIMARY KEY (`id`),  
  UNIQUE KEY `rdate`(`rdate`,`inventid`,`customerid`),  
  KEY `inventid` (`inventid`),  
  KEY `customerid` (`customerid`),  
  KEY `staffid` (`staffid`)  
) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=latin1

如：

root@mysql5601 14:58:  [lijk]> explain select inventid from test where rdate='2011-12-1400:00:00' order by  inventid , customerid;
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | test  | ref  | rdate         | rdate | 5       | const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
1 row in set, 5 warnings (0.00 sec)

Select選擇的列使用索引，而下面不使用索引：

root@mysql5601 15:00:  [lijk]> explain select * from test where rdate='2011-12-14 00:00:00'order by  inventid , customerid ;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
|  1 | SIMPLE      | test  | ref  | rdate         | rdate | 5       | const |    1 | Using where |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
1 row in set (0.00 sec)

2) 只有當(dāng)ORDER BY中所有的列必須包含在相同的索引，并且索引的順序和order by子句中的順序完全一致，并且所有列的排序方向（升序或者降序）一樣才有，（混合使用ASC模式和DESC模式則不使用索引）

root@mysql5601 15:00:  [lijk]> explain select inventid from test order by rdate, inventid ; 
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
|  1 | SIMPLE      | test  | index | NULL          | rdate | 13      | NULL |    1 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
1 row in set (0.00 sec)

root@mysql5601 15:01:  [lijk]> explain select inventid from test where rdate="2011-12-16" order by  inventid ,staffid;  
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                       |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | test  | ref  | rdate         | rdate | 5       | const |    1 | Using where; Using filesort |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
1 row in set (0.00 sec)

由于rdate, inventid使用了同一個(gè)索引。排序使用到了索引。這個(gè)也是滿足了前綴索引。但是order by inventid ,staffid;就不是使用了索引，因?yàn)閟taffid和inventid不是同一個(gè)索引

3) where 語句與ORDER BY語句組合滿足最左前綴：

root@mysql5601 15:02:  [lijk]> explain select inventid from test where rdate="2011-12-16" order by  inventid ;
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | test  | ref  | rdate         | rdate | 5       | const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

4) 如果查詢聯(lián)接了多個(gè)表，只有在order by子句的所有列引用的是第一個(gè)表的列才可以。

5) 在其他的情況下，mysql使用文件排序例如：

1） where語句與order by語句，使用了不同的索引
2）檢查的行數(shù)過多，且沒有使用覆蓋索引
3） ORDER BY中的列不包含在相同的索引，也就是使用了不同的索引
4）對(duì)索引列同時(shí)使用了ASC和DESC
5） where語句或者ORDER BY語句中索引列使用了表達(dá)式，包括函數(shù)表達(dá)式
6） where 語句與ORDER BY語句組合滿足最左前綴，但where語句中使用了條件查詢。查見第10句,雖然where與order by構(gòu)成了索引最左有綴的條件，但是where子句中使用的是條件查詢。

root@mysql5601 15:02:  [lijk]> explain select inventid from test where  rdate>"2011-12-16" order by  inventid;
+----+-------------+-------+-------+---------------+-------+---------+------+------+------------------------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra                                    |
+----+-------------+-------+-------+---------------+-------+---------+------+------+------------------------------------------+
|  1 | SIMPLE      | test  | index | rdate         | rdate | 13      | NULL |    1 | Using where; Using index; Using filesort |
+----+-------------+-------+-------+---------------+-------+---------+------+------+------------------------------------------+
1 row in set (0.00 sec)

7）當(dāng)使用left join，使用右邊的表字段排序

2.文件排序

這個(gè) filesort 并不是說通過磁盤文件進(jìn)行排序，而只是告訴我們進(jìn)行了一個(gè)排序操作。

即在MySQL Query Optimizer 所給出的執(zhí)行計(jì)劃(通過 EXPLAIN 命令查看)中被稱為文件排序（filesort）

文件排序是通過相應(yīng)的排序算法,將取得的數(shù)據(jù)在內(nèi)存中進(jìn)行排序: MySQL需要將數(shù)據(jù)在內(nèi)存中進(jìn)行排序，所使用的內(nèi)存區(qū)域也就是我們通過sort_buffer_size 系統(tǒng)變量所設(shè)置的排序區(qū)。這個(gè)排序區(qū)是每個(gè)Thread 獨(dú)享的，所以說可能在同一時(shí)刻在MySQL 中可能存在多個(gè) sort buffer 內(nèi)存區(qū)域。

在MySQL中filesort 的實(shí)現(xiàn)算法實(shí)際上是有兩種：

雙路排序：是首先根據(jù)相應(yīng)的條件取出相應(yīng)的排序字段和可以直接定位行數(shù)據(jù)的行指針信息，然后在sort buffer 中進(jìn)行排序。

單路排序：是一次性取出滿足條件行的所有字段，然后在sort buffer中進(jìn)行排序。

在MySQL4.1版本之前只有第一種排序算法雙路排序，第二種算法是從MySQL4.1開始的改進(jìn)算法，主要目的是為了減少第一次算法中需要兩次訪問表數(shù)據(jù)的 IO 操作，將兩次變成了一次，但相應(yīng)也會(huì)耗用更多的sortbuffer 空間。

當(dāng)然，MySQL4.1開始的以后所有版本同時(shí)也支持第一種算法。

MySQL主要通過比較我們所設(shè)定的系統(tǒng)參數(shù) max_length_for_sort_data的大小和Query 語句所取出的字段類型大小總和來判定需要使用哪一種排序算法。

如果 max_length_for_sort_data更大，則使用第二種優(yōu)化后的算法，反之使用第一種算法。

所以如果希望 ORDER BY 操作的效率盡可能的高，一定要主義max_length_for_sort_data 參數(shù)的設(shè)置。

曾經(jīng)就有同事的數(shù)據(jù)庫出現(xiàn)大量的排序等待，造成系統(tǒng)負(fù)載很高，而且響應(yīng)時(shí)間變得很長，最后查出正是因?yàn)镸ySQL 使用了傳統(tǒng)的第一種排序算法而導(dǎo)致，在加大了max_length_for_sort_data 參數(shù)值之后，系統(tǒng)負(fù)載馬上得到了大的緩解，響應(yīng)也快了很多。

2.1 MySQL 需要使用filesort 實(shí)現(xiàn)排序的實(shí)例

假設(shè)有 Table A 和 B 兩個(gè)表結(jié)構(gòu)分別如下：

# mysql

>show create table A\G
 　　*************************** 1. row ***************************
 　　Table: A
 　　Create Table: CREATE TABLE `A` (
 　　`id` int(11) NOT NULL default '0',
  　 `c2` char(2) default NULL,
 　　`c3` varchar(16) default NULL, 
 　　`c4` datetime default NULL, 
 　　PRIMARY KEY (`id`) 
 　　) ENGINE=MyISAM DEFAULT CHARSET=utf8

#:mysql

> show create table B\G
 　　*************************** 1. row ***************************  　　Table: B
 　　Create Table: CREATE TABLE `B` ( 
 　　`id` int(11) NOT NULL default '0', 
 　　`c2` char(2) default NULL,
 　　`c3` varchar(16) default NULL, 
 　　PRIMARY KEY (`id`),
 　　KEY `B_c2_ind` (`c2`)
 　　) ENGINE=MyISAM DEFAULT CHARSET=utf8

A.c2不是索引將使用：

root@mysql5601 15:02:  [lijk]> EXPLAIN SELECT A.* FROM A,B WHERE A.id >2 AND A.c2 <5 AND A.c2 = B.c2 ORDER BY A.c2\G

　　*************************** 1. row ***************************
　　id: 1
　　select_type: SIMPLE
　　table: A
　　type: range
　　possible_keys: PRIMARY
　　key: PRIMARY
　　key_len: 4
　　ref: NULL
　　rows: 3
　　Extra: Using where; Using filesort



*************************** 2. row ***************************
　　id: 1
　　select_type: SIMPLE
　　table: B
　　type: ref
　　possible_keys: B_c2_ind
　　key: B_c2_ind
　　key_len: 7
　　ref: example.A.c2
　　rows: 2
　　Extra: Using where; Using index

2.2 MySQL 需要使用Using temporary 臨時(shí)表來filesort

如果order by的子句只引用了聯(lián)接中的第一個(gè)表，MySQL會(huì)先對(duì)第一個(gè)表進(jìn)行排序，然后進(jìn)行聯(lián)接。

也就是expain中的Extra的Using Filesort.否則MySQL先把結(jié)果保存到臨時(shí)表(Temporary Table),然后再對(duì)臨時(shí)表的數(shù)據(jù)進(jìn)行排序.此時(shí)expain中的Extra的顯示Using temporary Using Filesort.

例如如果我們的排序數(shù)據(jù)如果是兩個(gè)(或者更多個(gè)) Table 通過 Join所得出的,如下例所示：

root@mysql5601 15:02:  [lijk]> explain select A.* from A,B
where A.id > 2 and A.c2 < 5 and A.c2 = B.c2 order by B.c3\G

　　*************************** 1. row***************************

　　id: 1
  　select_type: SIMPLE
　　table: A
　　type: range
　　possible_keys: PRIMARY
　　key: PRIMARY
　　key_len: 4
　　ref: NULL
  　rows: 3

Extra: Using where; Using temporary; Using filesort

　　*************************** 2. row ***************************
　　id: 1
　　select_type: SIMPLE
　　table: B

　　type: ref
　　possible_keys: B_c2_ind
　　key: B_c2_ind
　　key_len: 7
　　ref: example.A.c2
　　rows: 2
　　Extra: Using where

實(shí)際執(zhí)行過程應(yīng)該是如下圖所示：

3.優(yōu)化Filesort

當(dāng)無法避免排序操作時(shí)，又該如何來優(yōu)化呢？很顯然，應(yīng)該盡可能讓 MySQL 選擇使用第二種單路算法來進(jìn)行排序。這樣可以減少大量的隨機(jī)IO操作，很大幅度地提高排序工作的效率。

1. 加大 max_length_for_sort_data 參數(shù)的設(shè)置

在 MySQL 中，決定使用老式排序算法還是改進(jìn)版排序算法是通過參數(shù) max_length_for_ sort_data 來決定的。

當(dāng)所有返回字段的最大長度小于這個(gè)參數(shù)值時(shí)，MySQL 就會(huì)選擇改進(jìn)后的排序算法，反之，則選擇老式的算法。

所以，如果有充足的內(nèi)存讓MySQL 存放須要返回的非排序字段，就可以加大這個(gè)參數(shù)的值來讓 MySQL 選擇使用改進(jìn)版的排序算法。

2. 去掉不必要的返回字段

當(dāng)內(nèi)存不是很充裕時(shí)，不能簡單地通過強(qiáng)行加大上面的參數(shù)來強(qiáng)迫 MySQL 去使用改進(jìn)版的排序算法，否則可能會(huì)造成 MySQL 不得不將數(shù)據(jù)分成很多段，然后進(jìn)行排序，這樣可能會(huì)得不償失。

此時(shí)就須要去掉不必要的返回字段，讓返回結(jié)果長度適應(yīng) max_length_for_sort_data 參數(shù)的限制。

3. 增大 sort_buffer_size 參數(shù)設(shè)置

增大 sort_buffer_size 并不是為了讓 MySQL選擇改進(jìn)版的排序算法，而是為了讓MySQL盡量減少在排序過程中對(duì)須要排序的數(shù)據(jù)進(jìn)行分段，因?yàn)榉侄螘?huì)造成 MySQL 不得不使用臨時(shí)表來進(jìn)行交換排序。