快捷導(dǎo)航

postgresql實(shí)現(xiàn)對已有數(shù)據(jù)表分區(qū)處理的操作詳解

更新時間：2023年12月21日 08:28:10 作者：用戶3919820697669

這篇文章主要為大家詳細(xì)介紹了postgresql實(shí)現(xiàn)對已有數(shù)據(jù)表分區(qū)處理的操作的相關(guān)知識,文中的示例代碼講解詳細(xì),感興趣的小伙伴可以跟隨小編一起學(xué)習(xí)一下

背景：業(yè)務(wù)初期創(chuàng)建的一張表，有 20 個字段，沒有超長超大字段。隨著系統(tǒng)運(yùn)行，數(shù)據(jù)量來到了將近 1.3 億行約 60GB。目前整體穩(wěn)定，考慮到后續(xù)數(shù)據(jù)持續(xù)增長，打算先對表進(jìn)行分區(qū)處理?？紤]分區(qū)的主要因素是，這張表的數(shù)據(jù)主要是插入，和對最近插入數(shù)據(jù)的查詢，后續(xù)會有少量針對該表的全量查詢操作。

一、對已有數(shù)據(jù)進(jìn)行備份

創(chuàng)建備份表并將所有的數(shù)據(jù)備份到 t_test_back 表里。這種方式備份，只會復(fù)制表結(jié)構(gòu)和表數(shù)據(jù)，不會包含索引和約束。并且這種方式不會涉及到加鎖等操作，整體執(zhí)行很快，60GB 的數(shù)據(jù)大概在 5 分鐘左右備份完成。

create table public.t_test_back as (select * from public.t_test);

二、刪除原表

刪除原表之前，記得先保留好建表語句，原表索引和約束，原建表語句如下：

CREATE TABLE public.t_test (
    id bigserial NOT NULL,
    column1 int8 NOT NULL,
    column2 varchar NOT NULL,
    column3 varchar NULL,
    create_time timestamptz NULL,
    CONSTRAINT t_test_pkey PRIMARY KEY (id)
);

CREATE INDEX column1_idx ON public.t_test USING btree (column1);
CREATE INDEX column2_idx ON public.t_test USING btree (column2);

刪除原表直接使用 drop 語句：

drop table public.t_test;

我執(zhí)行的時候，大概十分鐘沒有反應(yīng)，最后查詢檢查 pg_stat_activity 視圖，發(fā)現(xiàn) public.t_test 還有 active sql 執(zhí)行。

通過下面的方式，將在執(zhí)行中的 sql 全部中斷掉。

-- 找到當(dāng)前表還在執(zhí)行的查詢
select pid, query, state from pg_stat_activity where state = 'state' and query like '%t_test%';
-- 可以取消查詢
select pg_cancel_backend(pid);
-- 也可以強(qiáng)制中止會話
select pg_terminate_backend(pid);

將執(zhí)行中的查詢?nèi)∠螅?code>drop 操作很快完成，正常執(zhí)行預(yù)估也是在 5 分鐘左右能執(zhí)行完成。

三、創(chuàng)建分區(qū)表

根據(jù)原表建表語句創(chuàng)建分區(qū)表，其中分區(qū)字段需要作為pk的一部分，我使用時間字段 create_time 作 range 分區(qū)：

CREATE TABLE public.t_test (
    id bigserial NOT NULL,
    column1 int8 NOT NULL,
    column2 varchar NOT NULL,
    column3 varchar NULL,
    create_time timestamptz NOT null DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT t_test_pkey PRIMARY KEY (id, create_time)
) partition by range(create_time);

我這里一年的數(shù)據(jù)量大概是 8 千萬，我按照半年一個分區(qū)建表，最后有一個 DEFAULT 的分區(qū)表，用來存儲分區(qū)以外的數(shù)據(jù)：

create table public.t_test_2022_02 partition of public.t_test for values from ('2022-06-01 00:00:00') to ('2022-12-31 23:59:59');

create table public.t_test_2023_01 partition of public.t_test for values from ('2023-01-01 00:00:00') to ('2023-06-30 23:59:59');
create table public.t_test_2023_02 partition of public.t_test for values from ('2023-07-01 00:00:00') to ('2023-12-31 23:59:59');

create table public.t_test_2024_01 partition of public.t_test for values from ('2024-01-01 00:00:00') to ('2024-06-30 23:59:59');
create table public.t_test_2024_02 partition of public.t_test for values from ('2024-07-01 00:00:00') to ('2024-12-31 23:59:59');

create table public.t_test_default partition of public.t_test DEFAULT;

我這里將原來的索引直接用在分區(qū)表的主表上：

CREATE INDEX column1_idx ON public.t_test USING btree (column1);
CREATE INDEX column2_idx ON public.t_test USING btree (column2);

由于我使用的 postgresql 的 serial 類型做 id 字段的自增序列，所以還需要將序列的最新值更新到比之前表的最大 id 還要大。防止主鍵沖突。

alter sequence public.t_test_id_seq restart 340000000;

四、數(shù)據(jù)恢復(fù)

由于我的表的數(shù)據(jù)主要使用的是增量數(shù)據(jù)，所以我把 id 最大的一條數(shù)據(jù)插入數(shù)據(jù)庫表后，就可以恢復(fù)服務(wù)。最后用 sql 將剩余的數(shù)據(jù)插入新表即可：

insert into public.t_test (select * from public.t_test_back);

到此這篇關(guān)于postgresql實(shí)現(xiàn)對已有數(shù)據(jù)表分區(qū)處理的操作詳解的文章就介紹到這了,更多相關(guān)postgresql數(shù)據(jù)表分區(qū)處理內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: