處理Hive中的數(shù)據(jù)傾斜的方法
1 groupby(大表分組-局部聚合+全局聚合)
示例1:
select label,sum(cnt) as all from ( select rd,label,sum(1) as cnt from ( select id,label,round(rand(),2) as rd,value from tmp1 ) as tmp group by rd,label ) as tmp group by label;
示例2:
select split(new_source,'\\_')[0] as source ,sum(cnt) as cnt from (select concat(source,'_', rand()*100) as new_source ,count(1) as cnt from test_table where day ='2022-01-01' group by concat(source,'_', rand()*100) )tt group by split(new_source,'\\_')[0]
2 join(大中表Join - 加salt + 小表膨脹)
示例1:
select label,sum(value) as all from ( select rd,label,sum(value) as cnt from ( select tmp1.rd as rd,tmp1.label as label,tmp1.value*tmp2.value as value from ( select id,round(rand(),1) as rd,label,value from tmp1 ) as tmp1 join ( select id,rd,label,value from tmp2 lateral view explode(split('0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9',',')) mytable as rd ) as tmp2 on tmp1.rd = tmp2.rd and tmp1.label = tmp2.label ) as tmp1 group by rd,label ) as tmp1 group by label;
示例2:
select source ,source_name ,sum(cnt) as cnt from (select t1.source ,new_source ,nvl(source_name,'未知') as source_name ,count(imei) as cnt from (select imei ,source ,concat(cast(rand()*10 as int ),'_',source ) as new_source from test_table_1 where day ='2022-01-01' ) t1 inner join ( select source_name ,concat(preflix,'_',source) as new_source from test_table_1 where day ='2022-01-01' lateral view explode(split('0,1,2,3,4,5,6,7,8,9,10',','))b as preflix ) t2 on t1.new_source =t2.new_source group by t1.source ,new_source ,nvl(source_name,'未知') ) tta group by source ,source_name
3 雙大表Join - 抽樣取傾斜key+BroadJoin
##優(yōu)化前: create table test.tmp_table_test_all as select imei ,lable_id ,nvl(label_name,'未知') from tmp_table_1 t1 left join (select lable_id ,label_name from tmp_table_2 where day ='2024-01-01') t2 on t1.lable_id =t2.lable_id where t1.day ='2024-01-01' ; ## 優(yōu)化后 : create table test.tmp_table_test_all_new as with tmp_table_test_1 as (select lable_id ,count(1) as cnt from tmp_table_1 t1 tablesample(5 percent) --抽樣取5%的數(shù)據(jù),減少table scan的量 group by lable_id order by cnt desc limit 100 ) select imei ,lable_id ,nvl(label_name,'未知') as label_name from tmp_table_1 t1 left join tmp_table_test_1 t2 on t1.lable_id =t2.lable_id left join (select lable_id ,label_name from tmp_table_2 where day ='2024-01-01') t3 on t1.lable_id =t3.lable_id where t1.day ='2024-01-01' and t2.lable_id is null union all select imei ,lable_id ,nvl(label_name,'未知') as label_name from tmp_table_1 t1 inner join (select lable_id from tmp_table_test_1 t1 left join tmp_table_2 t2 on t1.lable_id =t2.lable_id where t2.day ='2024-01-01') t3 on t1.lable_id =t3.lable_id where t1.day ='2024-01-01' ;
4 小結(jié)
到此這篇關(guān)于處理Hive中的數(shù)據(jù)傾斜的方法的文章就介紹到這了,更多相關(guān)處理Hive數(shù)據(jù)傾斜內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
mycat在windows環(huán)境下的安裝和啟動(dòng)
這篇文章主要介紹了mycat在windows環(huán)境下的安裝和啟動(dòng)過程,需要的朋友參考下吧2018-03-03idea中連接數(shù)據(jù)庫時(shí)出現(xiàn)SSL錯(cuò)誤的問題
這篇文章主要介紹了idea中連接數(shù)據(jù)庫是出現(xiàn)SSL錯(cuò)誤的問題,本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2020-10-10數(shù)據(jù)庫觸發(fā)器(Trigger)的一點(diǎn)使用心得
最近了解了一下數(shù)據(jù)庫觸發(fā)器,并做一點(diǎn)實(shí)際的應(yīng)用,在翻看其概念的時(shí)候,還是本著從理解的角度來學(xué)習(xí)的,但是,到了實(shí)際的應(yīng)用場(chǎng)景中,還是有一些特別注意的地方的,下面是自己在應(yīng)用中的幾點(diǎn)體會(huì)2009-07-07Linux下開啟和配置OpenGauss數(shù)據(jù)庫遠(yuǎn)程連接的教程詳解
openGauss是一款開源關(guān)系型數(shù)據(jù)庫管理系統(tǒng),采用木蘭寬松許可證v2發(fā)行,本文主要為大家介紹了Linux系統(tǒng)中如何開啟和配置OpenGauss數(shù)據(jù)庫的遠(yuǎn)程連接,需要的小伙伴可以參考下2023-12-12IntellJ Idea 2020版添加sqlite數(shù)據(jù)庫的方法
這篇文章主要介紹了IntellJ Idea 2020版添加sqlite數(shù)據(jù)庫的方法,本文通過圖文并茂的形式給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2020-11-11一文弄懂?dāng)?shù)據(jù)庫設(shè)計(jì)的三范式
面試中經(jīng)常會(huì)問到的數(shù)據(jù)庫三范式指的是什么,本文主要介紹了數(shù)據(jù)庫設(shè)計(jì)的三范式,對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2023-07-07sql連接查詢語句中on、where篩選的區(qū)別總結(jié)
接觸Sql語句時(shí)間挺長(zhǎng)時(shí)間了,聽他人說過sql語句的連接查詢,但一直沒有認(rèn)真研究和使用過!下面這篇文章主要給大家介紹了關(guān)于sql連接查詢語句中on、where篩選兩者之間區(qū)別的相關(guān)資料,文中通過圖文介紹的非常詳細(xì),需要的朋友們下面來一起看看吧。2017-07-07