Linux使用split切割日志文件的示例詳解

更新時(shí)間：2025年03月27日 15:18:05 作者：江湖有緣

split 是一個(gè)在Unix和類Unix系統(tǒng)（如Linux）中非常有用的命令行工具,它用于將大文件分割成較小的片段,下面我們就來(lái)看看如何使用split進(jìn)行切割日志文件吧

一、split命令介紹

split 是一個(gè)在Unix和類Unix系統(tǒng)（如Linux）中非常有用的命令行工具，它用于將大文件分割成較小的片段。這對(duì)于處理大型日志文件、數(shù)據(jù)傳輸或存儲(chǔ)受限的情況特別有用。

二、split命令的使用幫助

2.1 split命令help幫助信息

在命令行終端中，我們使用--help查詢split命令的基本幫助信息。

root@jeven01:~# split --help
Usage: split [OPTION]... [FILE [PREFIX]]
Output pieces of FILE to PREFIXaa, PREFIXab, ...;
default size is 1000 lines, and default PREFIX is 'x'.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of records per output file
  -d                      use numeric suffixes starting at 0, not alphabetic
      --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
  -x                      use hex suffixes starting at 0, not alphabetic
      --hex-suffixes[=FROM]  same as -x, but allow setting the start value
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines/records per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -t, --separator=SEP     use SEP instead of newline as the record separator;
                            '\0' (zero) specifies the NUL character
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
Binary prefixes can be used, too: KiB=K, MiB=M, and so on.

CHUNKS may be:
  N       split into N files based on size of input
  K/N     output Kth of N to stdout
  l/N     split into N files without splitting lines/records
  l/K/N   output Kth of N to stdout without splitting lines/records
  r/N     like 'l' but use round robin distribution
  r/K/N   likewise but only output Kth of N to stdout

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation <https://www.gnu.org/software/coreutils/split>
or available locally via: info '(coreutils) split invocation'

2.2 split命令選項(xiàng)解釋

下面是 split 命令的幫助信息翻譯成中文，并以Markdown表格的形式進(jìn)行整理：

選項(xiàng)	描述
-a, --suffix-length=N	生成長(zhǎng)度為N的后綴（默認(rèn)為2）
--additional-suffix=SUFFIX	在文件名后面追加額外的SUFFIX
-b, --bytes=SIZE	每個(gè)輸出文件大小為SIZE字節(jié)
-C, --line-bytes=SIZE	每個(gè)輸出文件最多包含SIZE字節(jié)的記錄
-d	使用從0開(kāi)始的數(shù)字后綴，而不是字母后綴
--numeric-suffixes[=FROM]	與-d相同，但允許設(shè)置起始值
-x	使用從0開(kāi)始的十六進(jìn)制后綴，而不是字母后綴
--hex-suffixes[=FROM]	與-x相同，但允許設(shè)置起始值
-e, --elide-empty-files	當(dāng)使用’-n’時(shí)，不生成空的輸出文件
--filter=COMMAND	將內(nèi)容寫(xiě)入shell命令COMMAND；文件名為$FILE
-l, --lines=NUMBER	每個(gè)輸出文件包含NUMBER行/記錄
-n, --number=CHUNKS	生成CHUNKS個(gè)輸出文件；詳情見(jiàn)下文
-t, --separator=SEP	使用SEP作為記錄分隔符，而不是換行符；'\0’指定NUL字符
-u, --unbuffered	在使用’-n r/…'時(shí)立即復(fù)制輸入到輸出
--verbose	在打開(kāi)每個(gè)輸出文件之前打印診斷信息
--help	顯示幫助信息并退出
--version	輸出版本信息并退出

SIZE 參數(shù)

SIZE參數(shù)是一個(gè)整數(shù)和可選單位（例如：10K表示10*1024）。
單位可以是K, M, G, T, P, E, Z, Y（1024的冪）或KB, MB, …（1000的冪）。
也可以使用二進(jìn)制前綴：KiB=K, MiB=M等。

CHUNKS 參數(shù)

N: 根據(jù)輸入的大小分割成N個(gè)文件
K/N: 將第K個(gè)輸出到標(biāo)準(zhǔn)輸出，總共N份
l/N: 不拆分行/記錄地分割成N個(gè)文件
l/K/N: 不拆分行/記錄地將第K個(gè)輸出到標(biāo)準(zhǔn)輸出，總共N份
r/N: 類似’l’，但是使用循環(huán)分配
r/K/N: 同上，但只輸出第K個(gè)到標(biāo)準(zhǔn)輸出

三、split命令的基本使用

3.1 生成測(cè)試文件

生成一個(gè)2M大小的測(cè)試文件

root@jeven01:/test# dd if=/dev/zero bs=1M count=2 of=test.file
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00158099 s, 1.3 GB/s
root@jeven01:/test# ll -h test.file
-rw-r--r-- 1 root root 2.0M Oct  3 20:35 test.file

3.2 分割大小為200KB的小文件

使用-b選項(xiàng)，將剛才創(chuàng)建的文件分割成大小為200KB的小文件：

root@jeven01:/test# split -b 200k test.file
root@jeven01:/test# ls
test.file  xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj  xak

3.3 切割為帶數(shù)字后綴的文件

使用-a與-d選項(xiàng)，將大文件切割為帶數(shù)字后綴的小文件。

 root@jeven01:/test# split -b 200k test.file -d -a 3
root@jeven01:/test# ll
total 4104
drwxr-xr-x  2 root root    4096 Oct  3 20:42 ./
drwxr-xr-x 22 root root    4096 Sep 24 22:37 ../
-rw-r--r--  1 root root 2097152 Oct  3 20:35 test.file
-rw-r--r--  1 root root  204800 Oct  3 20:42 x000
-rw-r--r--  1 root root  204800 Oct  3 20:42 x001
-rw-r--r--  1 root root  204800 Oct  3 20:42 x002
-rw-r--r--  1 root root  204800 Oct  3 20:42 x003
-rw-r--r--  1 root root  204800 Oct  3 20:42 x004
-rw-r--r--  1 root root  204800 Oct  3 20:42 x005
-rw-r--r--  1 root root  204800 Oct  3 20:42 x006
-rw-r--r--  1 root root  204800 Oct  3 20:42 x007
-rw-r--r--  1 root root  204800 Oct  3 20:42 x008
-rw-r--r--  1 root root  204800 Oct  3 20:42 x009
-rw-r--r--  1 root root   49152 Oct  3 20:42 x010

3.4 按行數(shù)分割文件

按行數(shù)分割文件：將test.file 文件每1000行分割成一個(gè)新的文件，新文件名為 logs_part_aa, logs_part_ab 等等

split -l 1000 test.file logs_part_

3.5 定文件名的前綴

切割后的文件名后綴以000等依次命名，前綴使用split_file。

root@jeven01:/test# split -b 200k test.file -d -a 3 split_file
root@jeven01:/test# ll -h
total 4.1M
drwxr-xr-x  2 root root 4.0K Oct  3 20:57 ./
drwxr-xr-x 22 root root 4.0K Sep 24 22:37 ../
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file000
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file001
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file002
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file003
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file004
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file005
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file006
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file007
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file008
-rw-r--r--  1 root root 200K Oct  3 20:57 split_file009
-rw-r--r--  1 root root  48K Oct  3 20:57 split_file010
-rw-r--r--  1 root root 2.0M Oct  3 20:35 test.file

四、注意事項(xiàng)

1.確保日志文件完整性：當(dāng)按行數(shù)或字節(jié)數(shù)分割日志文件時(shí)，請(qǐng)注意保持日志記錄的完整性。避免將一條完整的日志記錄拆分到兩個(gè)不同的文件中，這可能會(huì)導(dǎo)致日志分析時(shí)出現(xiàn)誤解?？梢允褂?-C 選項(xiàng)來(lái)限制每個(gè)輸出文件的最大字節(jié)數(shù)，同時(shí)盡量不拆分行。

2.合理選擇分割大?。焊鶕?jù)您的存儲(chǔ)需求和日志處理策略，合理設(shè)置每個(gè)分割文件的大小。過(guò)大的文件可能導(dǎo)致處理不便，而過(guò)小的文件則會(huì)增加管理復(fù)雜度。例如，如果每天生成的日志量大約是50MB，那么可以考慮將文件分割成10MB左右的小塊。

3.使用合適的后綴命名規(guī)則：為了便于管理和識(shí)別，給分割后的文件設(shè)置清晰且有意義的前綴和后綴。通過(guò) -a 選項(xiàng)指定后綴長(zhǎng)度，并使用 -d 或 --numeric-suffixes 選項(xiàng)為文件添加數(shù)字后綴，這樣有助于按順序處理這些文件。

4.考慮時(shí)間戳信息：如果日志文件包含時(shí)間戳，確保在分割過(guò)程中保留這一重要信息。這有助于后續(xù)根據(jù)時(shí)間進(jìn)行快速定位和檢索?？梢酝ㄟ^(guò) -t 選項(xiàng)自定義記錄分隔符，以適應(yīng)不同格式的時(shí)間戳。

5.測(cè)試并驗(yàn)證結(jié)果：在正式應(yīng)用之前，先對(duì)少量樣本數(shù)據(jù)進(jìn)行分割測(cè)試，檢查輸出文件是否符合預(yù)期。確保所有配置正確無(wú)誤后再對(duì)完整日志執(zhí)行操作。這一步驟可以幫助您提前發(fā)現(xiàn)可能的問(wèn)題并及時(shí)調(diào)整方案。

6.備份原始日志文件：在進(jìn)行任何切割操作之前，務(wù)必先備份原始日志文件。雖然 split 命令不會(huì)修改源文件，但備份可以防止意外刪除或其他人為錯(cuò)誤導(dǎo)致的數(shù)據(jù)丟失。

到此這篇關(guān)于Linux使用split切割日志文件的示例詳解的文章就介紹到這了,更多相關(guān)Linux split切割日志文件內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: