Java如何實(shí)現(xiàn)數(shù)據(jù)壓縮所有方式性能測(cè)試
1 BZip方式
ZIP文件格式是一種數(shù)據(jù)壓縮和文檔儲(chǔ)存的文件格式,原名Deflate,發(fā)明者為菲爾·卡茨(Phil Katz),他于1989年1月公布了該格式的資料。ZIP通常使用后綴名“.zip”,它的MIME格式為application/zip。當(dāng)前,ZIP格式屬于幾種主流的壓縮格式之一,其競(jìng)爭(zhēng)者包括RAR格式以及開放源碼的7z格式。從性能上比較,RAR及7z格式較ZIP格式壓縮率較高,而7-Zip由于提供了免費(fèi)的壓縮工具而逐漸在更多的領(lǐng)域得到應(yīng)用。
Microsoft從Windows ME操作系統(tǒng)開始內(nèi)置對(duì)zip格式的支持,即使用戶的計(jì)算機(jī)上沒有安裝解壓縮軟件,也能打開和制作zip格式的壓縮文件,OS X和流行的Linux操作系統(tǒng)也對(duì)zip格式提供了類似的支持。因此如果在網(wǎng)絡(luò)上傳播和分發(fā)文件,zip格式往往是最常用的選擇。
1.1 引入依賴
<dependency> <groupId>org.apache.ant</groupId> <artifactId>ant</artifactId> <version>1.10.6</version> </dependency>
1.2 BZip工具類代碼
import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import org.apache.tools.bzip2.CBZip2InputStream; import org.apache.tools.bzip2.CBZip2OutputStream; public class BZip2Util { private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream bos = new ByteArrayOutputStream(); try (CBZip2OutputStream bzip2 = new CBZip2OutputStream(bos)) { bzip2.write(bytes); bzip2.finish(); return bos.toByteArray(); } catch (IOException e) { throw new RuntimeException("BZip2 compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); ByteArrayInputStream bis = new ByteArrayInputStream(bytes); try (CBZip2InputStream bzip2 = new CBZip2InputStream(bis)) { byte[] buffer = new byte[BUFFER_SIZE]; int n; while ((n = bzip2.read(buffer)) > -1) { out.write(buffer, 0, n); } return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("BZip2 decompress error", e); } } }
1.3 BZip2工具類代碼
import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import org.apache.tools.bzip2.CBZip2InputStream; import org.apache.tools.bzip2.CBZip2OutputStream; public class BZip2Util { private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream bos = new ByteArrayOutputStream(); try (CBZip2OutputStream bzip2 = new CBZip2OutputStream(bos)) { bzip2.write(bytes); bzip2.finish(); return bos.toByteArray(); } catch (IOException e) { throw new RuntimeException("BZip2 compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); ByteArrayInputStream bis = new ByteArrayInputStream(bytes); try (CBZip2InputStream bzip2 = new CBZip2InputStream(bis)) { byte[] buffer = new byte[BUFFER_SIZE]; int n; while ((n = bzip2.read(buffer)) > -1) { out.write(buffer, 0, n); } return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("BZip2 decompress error", e); } } }
2 Deflater方式
import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.zip.Deflater; import java.util.zip.Inflater; public class DeflaterUtil { private DeflaterUtil() { } private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } int lenght = 0; Deflater deflater = new Deflater(); deflater.setInput(bytes); deflater.finish(); byte[] outputBytes = new byte[BUFFER_SIZE]; try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) { while (!deflater.finished()) { lenght = deflater.deflate(outputBytes); bos.write(outputBytes, 0, lenght); } deflater.end(); return bos.toByteArray(); } catch (IOException e) { throw new RuntimeException("Deflater compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } int length = 0; Inflater inflater = new Inflater(); inflater.setInput(bytes); byte[] outputBytes = new byte[BUFFER_SIZE]; try (ByteArrayOutputStream bos = new ByteArrayOutputStream();) { while (!inflater.finished()) { length = inflater.inflate(outputBytes); if (length == 0) { break; } bos.write(outputBytes, 0, length); } inflater.end(); return bos.toByteArray(); } catch (Exception e) { throw new RuntimeException("Deflater decompress error", e); } } }
3 Gzip方式
import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; public class GzipUtil { private GzipUtil() { } private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (GZIPOutputStream gzip = new GZIPOutputStream(out)) { gzip.write(bytes); gzip.flush(); gzip.finish(); return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("gzip compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (GZIPInputStream gunzip = new GZIPInputStream(new ByteArrayInputStream(bytes))) { byte[] buffer = new byte[BUFFER_SIZE]; int n; while ((n = gunzip.read(buffer)) > -1) { out.write(buffer, 0, n); } return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("gzip decompress error", e); } } }
4 Lz4方式
4.1 簡(jiǎn)介
Lz4壓縮算法是由Yann Collet在2011年設(shè)計(jì)實(shí)現(xiàn)的,lz4屬于lz77系列的壓縮算法。lz77嚴(yán)格意義上來(lái)說(shuō)不是一種算法,而是一種編碼理論,它只定義了原理,并沒有定義如何實(shí)現(xiàn)。基于lz77理論衍生的算法除lz4以外,還有l(wèi)zss、lzb、lzh等。
lz4是目前基于綜合來(lái)看效率最高的壓縮算法,更加側(cè)重于壓縮解壓縮速度,壓縮比并不突出,本質(zhì)上就是時(shí)間換空間。
對(duì)于github上給出的lz4性能介紹:每核壓縮速度大于500MB/s,多核CPU可疊加;它所提供的解碼器也是極其快速的,每核可達(dá)GB/s量級(jí)。
4.2 算法思想
- lz77編碼思想:它是一種基于字典的算法,它將長(zhǎng)字符串(也可以稱為匹配項(xiàng)或者短語(yǔ))編碼成短小的標(biāo)記,用小標(biāo)記代替字典中的短語(yǔ),也就是說(shuō),它通過(guò)用小的標(biāo)記來(lái)代替數(shù)據(jù)中多次重復(fù)出現(xiàn)的長(zhǎng)字符串來(lái)達(dá)到數(shù)據(jù)壓縮的目的。其處理的符號(hào)不一定是文本字符,也可以是其他任意大小的符號(hào)。
- 短語(yǔ)字典維護(hù):lz77使用的是一個(gè)前向緩沖區(qū)和一個(gè)滑動(dòng)窗口。它首先將數(shù)據(jù)載入到前向緩沖區(qū),形成一批短語(yǔ),再由滑動(dòng)窗口滑動(dòng)時(shí),變成字典的一部分。
4.3 算法實(shí)現(xiàn)
4.3.1 lz4數(shù)據(jù)格式
- lz4實(shí)現(xiàn)了兩種格式,分別是lz4_block_format和lz4_frame_format。
- lz4_frame_format用于特殊場(chǎng)景,如file壓縮、pipe壓縮和流式壓縮;這里主要介紹lz4_block_format(一般場(chǎng)景使用格式)
壓縮塊有多個(gè)序列組成,一個(gè)序列是由一組字面量(非壓縮字節(jié)),后跟一個(gè)匹配副本。每個(gè)序列以token開始,字面量和匹配副本的長(zhǎng)度是有token以及offset決定的。
- literals指沒有重復(fù)、首次出現(xiàn)的字節(jié)流,即不可壓縮的部分
- literals length指不可壓縮部分的長(zhǎng)度
- match length指重復(fù)項(xiàng)(可以壓縮的部分)長(zhǎng)度
下圖為單個(gè)序列的數(shù)據(jù)格式,一個(gè)完整的lz4壓縮塊是由多個(gè)序列組成的。
2、lz4壓縮過(guò)程
lz4遵循上面說(shuō)到的lz77思想理論,通過(guò)滑動(dòng)窗口、hash表、數(shù)據(jù)編碼等操作實(shí)現(xiàn)數(shù)據(jù)壓縮。壓縮過(guò)程以至少4字節(jié)為掃描窗口查找匹配,每次移動(dòng)1字節(jié)進(jìn)行掃描,遇到重復(fù)的就進(jìn)行壓縮。
舉個(gè)例子:給出一個(gè)字符串: abcde_fghabcde_ghxxahcde
,描述出此字符串的壓縮過(guò)程
ps:我們按照6字節(jié)掃描窗口,每次1字節(jié)來(lái)進(jìn)行掃描
- 假設(shè)lz4的滑動(dòng)窗口大小為6字節(jié),掃描窗口為1字節(jié);
- lz4開始掃描,首先對(duì)0-5位置做hash運(yùn)算,hash表中無(wú)該值,所以存入hash表;
- 向后掃描,開始計(jì)算1-6位置hash值,hash表中依然無(wú)此值,所以繼續(xù)將hash值存入hash表;
- 掃描過(guò)程依次類推,直到圖中例子,在計(jì)算9-15位置的hash值時(shí),發(fā)現(xiàn)hash表中已經(jīng)存在,則進(jìn)行壓縮,偏移量offset值置為9,重復(fù)長(zhǎng)度為6,該值存入token值的低4位中;
- 匹配壓縮項(xiàng)后開始嘗試擴(kuò)大匹配,當(dāng)窗口掃描到10-16時(shí),發(fā)現(xiàn)并沒有匹配到,則將此值存入hash表;如果發(fā)現(xiàn)hash表中有值,如果符合匹配條件(例如10-15符合1-6)則擴(kuò)大匹配項(xiàng),重復(fù)長(zhǎng)度設(shè)為7,調(diào)整相應(yīng)的token值
- 這樣滑動(dòng)窗口掃描完所有的字符串之后,結(jié)束操作
最終,這樣壓縮過(guò)程就結(jié)束了,得到這樣一個(gè)字節(jié)串[-110, 97, 98, 99, 100, 101, 95, 102, 103, 104, 9, 0, -112, 103, 104, 120, 120, 97, 104, 99, 100, 101]。大家可能在看到這段內(nèi)容可能有些懵逼,我在解壓過(guò)程解釋一下。
3、lz4解壓過(guò)程
- lz4壓縮串: [-110, 97, 98, 99, 100, 101, 95, 102, 103, 104, 9, 0, -112, 103, 104, 120, 120, 97, 104, 99, 100, 101]
- 二進(jìn)制是字符串經(jīng)過(guò)utf-8編碼后的值
下圖是對(duì)上面壓縮串的解釋:
這里簡(jiǎn)單記錄下解壓的過(guò)程:
- 當(dāng)lz4解壓從0開始遍歷時(shí),先判斷token值(-110),-110轉(zhuǎn)換為計(jì)算機(jī)二進(jìn)制為10010010,高四位1001代表字面量長(zhǎng)度為9,低四位0010代表重復(fù)項(xiàng)匹配的長(zhǎng)度2+4(minimum repeated bytes)
- 向后遍歷9位,得到長(zhǎng)度為9的字符串(abcde_fgh),偏移量為9,從當(dāng)前位置向前移動(dòng)9位則是重復(fù)位起始位置,低四位說(shuō)明重復(fù)項(xiàng)長(zhǎng)度為6字節(jié),則繼續(xù)生成長(zhǎng)度為6的字符串(abcde_)
- 此時(shí)生成(abcde_fghabcde_),接著開始判斷下一sequence token起始位,最終生成abcde_fghabcde_ghxxahcde(壓縮前的字符串)
4.4 Lz4-Java
lz4/lz4-java是由Rei Odaira等人寫的一套使用lz4壓縮的Java類庫(kù)。
4.4.1 簡(jiǎn)介
該類庫(kù)提供了對(duì)兩種壓縮方法的訪問,他們都能生成有效的lz4流:
快速掃描(lz4)
- 內(nèi)存占用少(16KB)
- 非常快
- 合理的壓縮比(取決于輸入的冗余度)
高壓縮(lz4hc)
- 內(nèi)存占用中等(256KB)
- 相當(dāng)慢(比lz4慢10倍)
- 良好的壓縮比(取決于輸入的大小和冗余度)
這兩種壓縮算法產(chǎn)生的流使用相同的壓縮格式,解壓縮速度非???,可以由相同的解壓縮實(shí)例解壓縮
4.4.2 類庫(kù)
該類庫(kù)提供了幾個(gè)關(guān)鍵類,這里簡(jiǎn)單介紹一下
LZ4Factory
Lz4 API的入口點(diǎn),該類有3個(gè)實(shí)例
- 一個(gè)native實(shí)例,它是與原始LZ4 C實(shí)現(xiàn)的JNI綁定
- 一個(gè)safe Java實(shí)例,它是原始C庫(kù)的純Java端口(Java 官方編寫的API)
- 一個(gè)unsafe Java實(shí)例,它是使用非官方sun.misc.Unsafe API的Java端口(Unsafe類可用來(lái)直接訪問系統(tǒng)內(nèi)存資源并進(jìn)行自主管理,其在提升Java運(yùn)行效率,增強(qiáng)Java語(yǔ)言底層操作能力方面起到很大的作用,Unsafe可認(rèn)為是Java中留下的后門,提供了一些低層次操作,如直接內(nèi)存訪問、線程調(diào)度等)
只有safe Java實(shí)例才能保證在JVM上工作,因此建議使用fastestInstance()或fastestJavaInstance()來(lái)拉取LZ4Factory實(shí)例。
LZ4Compressor
壓縮器有兩種,一種是fastCompressor,也就是lz4簡(jiǎn)介中說(shuō)的快速掃描壓縮器;另一種是highCompressor,是實(shí)現(xiàn)高壓縮率壓縮器(lz4hc)。
LZ4Decompressor
lz4-java提供了兩個(gè)解壓器:LZ4FastDecompressor;LZ4SafeDecompressor
兩者不同點(diǎn)在于:LZ4FastDecompressor在解壓縮時(shí)是已知源字符串長(zhǎng)度,而LZ4SafeDecompressor在解壓縮時(shí)是已知壓縮字段的長(zhǎng)度
使用:
上面說(shuō)到的兩個(gè)壓縮器和兩個(gè)解壓縮器,在壓縮和解壓縮的時(shí)候,是可以互換的,比如說(shuō)FastCompressor可以和LZ4SafeDecompressor搭配使用這樣,因?yàn)閮煞N壓縮算法生成的流格式是一樣的,無(wú)論用哪個(gè)解壓縮器都能解壓。
在說(shuō)完上面基本的類之后,再來(lái)看下lz4-Java類庫(kù)給我們提供流式傳輸類:LZ4BlockOutputStream(輸出流-編碼)、LZ4BlockInputStream(輸入流-解碼)
下面這段代碼是使用示例:
package com.oldlu.compress.utils; import net.jpountz.lz4.*; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.StringReader; import java.io.UnsupportedEncodingException; import java.nio.charset.StandardCharsets; public class Lz4Utils { private static final int ARRAY_SIZE = 4096; private static LZ4Factory factory = LZ4Factory.fastestInstance(); private static LZ4Compressor compressor = factory.fastCompressor(); private static LZ4FastDecompressor decompressor = factory.fastDecompressor(); private static LZ4SafeDecompressor safeDecompressor = factory.safeDecompressor(); public static byte[] compress(byte[] bytes) { if (bytes == null || bytes.length == 0) { return null; } try { ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); LZ4BlockOutputStream lz4BlockOutputStream = new LZ4BlockOutputStream(outputStream, ARRAY_SIZE, compressor); lz4BlockOutputStream.write(bytes); lz4BlockOutputStream.finish(); return outputStream.toByteArray(); } catch (Exception e) { System.err.println("Lz4壓縮失敗"); } return null; } public static byte[] uncompress(byte[] bytes) { if (bytes == null || bytes.length == 0) { return null; } try { ByteArrayOutputStream outputStream = new ByteArrayOutputStream(ARRAY_SIZE); ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes); LZ4BlockInputStream decompressedInputStream = new LZ4BlockInputStream(inputStream, decompressor); int count; byte[] buffer = new byte[ARRAY_SIZE]; while ((count = decompressedInputStream.read(buffer)) != -1) { outputStream.write(buffer, 0, count); } return outputStream.toByteArray(); } catch (Exception e) { System.err.println("lz4解壓縮失敗"); } return null; } public static void main(String[] args) { byte[] bytes = "abcde_fghabcde_ghxxahcde".getBytes(StandardCharsets.UTF_8); byte[] compress = compress(bytes); byte[] decompress = uncompress(compress); } }
5 SevenZ方式
5.1 引入依賴
<dependency> <groupId>org.tukaani</groupId> <artifactId>xz</artifactId> <version>1.8</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.19</version> </dependency>
5.2 工具類代碼
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry; import org.apache.commons.compress.archivers.sevenz.SevenZFile; import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile; import org.apache.commons.compress.utils.SeekableInMemoryByteChannel; import java.io.ByteArrayOutputStream; import java.io.IOException; public class SevenZUtil { private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } SeekableInMemoryByteChannel channel = new SeekableInMemoryByteChannel(); try (SevenZOutputFile z7z = new SevenZOutputFile(channel)) { SevenZArchiveEntry entry = new SevenZArchiveEntry(); entry.setName("sevenZip"); entry.setSize(bytes.length); z7z.putArchiveEntry(entry); z7z.write(bytes); z7z.closeArchiveEntry(); z7z.finish(); return channel.array(); } catch (IOException e) { throw new RuntimeException("SevenZ compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); SeekableInMemoryByteChannel channel = new SeekableInMemoryByteChannel(bytes); try (SevenZFile sevenZFile = new SevenZFile(channel)) { byte[] buffer = new byte[BUFFER_SIZE]; while (sevenZFile.getNextEntry() != null) { int n; while ((n = sevenZFile.read(buffer)) > -1) { out.write(buffer, 0, n); } } return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("SevenZ decompress error", e); } } }
6 Zip方式
import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; import java.util.zip.ZipOutputStream; public class ZipUtil { private static final int BUFFER_SIZE = 8192; public static byte[] compress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (ZipOutputStream zip = new ZipOutputStream(out)) { ZipEntry entry = new ZipEntry("zip"); entry.setSize(bytes.length); zip.putNextEntry(entry); zip.write(bytes); zip.closeEntry(); return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("Zip compress error", e); } } public static byte[] decompress(byte[] bytes) { if (bytes == null) { throw new NullPointerException("bytes is null"); } ByteArrayOutputStream out = new ByteArrayOutputStream(); try (ZipInputStream zip = new ZipInputStream(new ByteArrayInputStream(bytes))) { byte[] buffer = new byte[BUFFER_SIZE]; while (zip.getNextEntry() != null) { int n; while ((n = zip.read(buffer)) > -1) { out.write(buffer, 0, n); } } return out.toByteArray(); } catch (IOException e) { throw new RuntimeException("Zip decompress error", e); } } }
7 性能對(duì)比
我們可以使用它來(lái)和其他壓縮類進(jìn)行一個(gè)性能對(duì)比
測(cè)試源代碼:
package com.oldlu.compress.test; import com.oldlu.compress.domain.User; import com.oldlu.compress.service.UserService; import com.oldlu.compress.utils.*; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.results.format.ResultFormatType; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class PerformanceTest { /** * 用來(lái)序列化的用戶對(duì)象 */ @State(Scope.Benchmark) public static class CommonState { User user; byte[] originBytes; byte[] lz4CompressBytes; byte[] snappyCompressBytes; byte[] gzipCompressBytes; byte[] bzipCompressBytes; byte[] deflateCompressBytes; @Setup(Level.Trial) public void prepare() { UserService userService = new UserService(); user = userService.get(); originBytes = ProtostuffUtils.serialize(user); lz4CompressBytes = Lz4Utils.compress(originBytes); snappyCompressBytes = SnappyUtils.compress(originBytes); gzipCompressBytes = GzipUtils.compress(originBytes); bzipCompressBytes = Bzip2Utils.compress(originBytes); deflateCompressBytes = DeflateUtils.compress(originBytes); } } /** * Lz4壓縮 * * @param commonState * @return */ @Benchmark public byte[] lz4Compress(CommonState commonState) { return Lz4Utils.compress(commonState.originBytes); } /** * lz4解壓縮 * * @param commonState */ @Benchmark public byte[] lz4Uncompress(CommonState commonState) { return Lz4Utils.uncompress(commonState.lz4CompressBytes); } /** * snappy壓縮 * * @param commonState * @return */ @Benchmark public byte[] snappyCompress(CommonState commonState) { return SnappyUtils.compress(commonState.originBytes); } /** * snappy解壓縮 * * @param commonState * @return */ @Benchmark public byte[] snappyUncompress(CommonState commonState) { return SnappyUtils.uncompress(commonState.snappyCompressBytes); } /** * Gzip壓縮 * * @param commonState * @return */ @Benchmark public byte[] gzipCompress(CommonState commonState) { return GzipUtils.compress(commonState.originBytes); } /** * Gzip解壓縮 * * @param commonState * @return */ @Benchmark public byte[] gzipUncompress(CommonState commonState) { return GzipUtils.uncompress(commonState.gzipCompressBytes); } /** * bzip2壓縮 * * @param commonState * @return */ @Benchmark public byte[] bzip2Compress(CommonState commonState) { return Bzip2Utils.compress(commonState.originBytes); } /** * bzip2壓縮 * * @param commonState * @return */ @Benchmark public byte[] bzip2Uncompress(CommonState commonState) { return Bzip2Utils.uncompress(commonState.bzipCompressBytes); } /** * bzip2壓縮 * * @param commonState * @return */ @Benchmark public byte[] deflateCompress(CommonState commonState) { return DeflateUtils.compress(commonState.originBytes); } /** * bzip2壓縮 * * @param commonState * @return */ @Benchmark public byte[] deflateUncompress(CommonState commonState) { return DeflateUtils.uncompress(commonState.deflateCompressBytes); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(PerformanceTest.class.getSimpleName()) .forks(1) .threads(1) .warmupIterations(10) .measurementIterations(10) .result("PerformanceTest.json") .resultFormat(ResultFormatType.JSON).build(); new Runner(opt).run(); } }
性能測(cè)試圖:
附上lz4官網(wǎng)給出的性能測(cè)試圖和自己測(cè)試的性能圖,有些差異,有可能對(duì)于壓縮數(shù)據(jù)的不同導(dǎo)致的差異。
- 官網(wǎng)給的:
- 手工測(cè):
在公司對(duì)于特征內(nèi)容的壓縮,觀察lz4和snappy的對(duì)比,看上去lz4和snappy的壓縮和解壓縮的性能差不多,但lz4更穩(wěn)定些,尖刺場(chǎng)景少。由于設(shè)計(jì)公司內(nèi)部?jī)?nèi)容,就不粘圖了。
7.1 壓縮率對(duì)比
在壓縮率上,按照從高到低是:bzip2 > Deflate > Gzip > lz4 > snappy
package com.oldlu.compress.demo; import com.alibaba.fastjson.JSONObject; import com.oldlu.compress.domain.User; import com.oldlu.compress.service.UserService; import com.oldlu.compress.utils.*; import java.nio.charset.StandardCharsets; public class CompressDemo { public static void main(String[] args) { User user = new UserService().get(); // json序列化 byte[] origin_json = JSONObject.toJSONBytes(user); System.out.println("原始json字節(jié)數(shù): " + origin_json.length); // pb序列化 byte[] origin = ProtostuffUtils.serialize(user); System.out.println("原始pb字節(jié)數(shù): " + origin.length); testGzip(origin, user); testSnappy(origin, user); testLz4(origin, user); testBzip2(origin, user); testDeflate(origin, user); } private static void test(){ System.out.println("--------------------"); String str = getString(); byte[] source = str.getBytes(StandardCharsets.UTF_8); byte[] compress = Lz4Utils.compress(source); // 將compress轉(zhuǎn)為字符串 System.out.println(translateString(compress)); System.out.println(); System.out.println("--------------------"); String str2 = getString2(); byte[] source2 = str2.getBytes(StandardCharsets.UTF_8); byte[] compress2 = Lz4Utils.compress(source2); byte[] uncompress = Lz4Utils.uncompress(compress2); System.out.println(); } private static String translateString(byte[] bytes) { char[] chars = new char[bytes.length]; for (int i = 0; i < chars.length; i++) { chars[i] = (char) bytes[i]; } String str = new String(chars); return str; } private static String getString() { return "fghabcde_bcdefgh_abcdefghxxxxxxx"; } private static String getString2() { return "abcde_fghabcde_ghxxahcde"; } private static void testGzip(byte[] origin, User user) { System.out.println("---------------GZIP壓縮---------------"); // Gzip壓縮 byte[] gzipCompress = GzipUtils.compress(origin); System.out.println("Gzip壓縮: " + gzipCompress.length); byte[] gzipUncompress = GzipUtils.uncompress(gzipCompress); System.out.println("Gzip解壓縮: " + gzipUncompress.length); User deUser = ProtostuffUtils.deserialize(gzipUncompress, User.class); System.out.println("對(duì)象是否相等: " + user.equals(deUser)); } private static void testSnappy(byte[] origin, User user) { System.out.println("---------------Snappy壓縮---------------"); // Snappy壓縮 byte[] snappyCompress = SnappyUtils.compress(origin); System.out.println("Snappy壓縮: " + snappyCompress.length); byte[] snappyUncompress = SnappyUtils.uncompress(snappyCompress); System.out.println("Snappy解壓縮: " + snappyUncompress.length); User deUser = ProtostuffUtils.deserialize(snappyUncompress, User.class); System.out.println("對(duì)象是否相等: " + user.equals(deUser)); } private static void testLz4(byte[] origin, User user) { System.out.println("---------------Lz4壓縮---------------"); // Lz4壓縮 byte[] Lz4Compress = Lz4Utils.compress(origin); System.out.println("Lz4壓縮: " + Lz4Compress.length); byte[] Lz4Uncompress = Lz4Utils.uncompress(Lz4Compress); System.out.println("Lz4解壓縮: " + Lz4Uncompress.length); User deUser = ProtostuffUtils.deserialize(Lz4Uncompress, User.class); System.out.println("對(duì)象是否相等: " + user.equals(deUser)); } private static void testBzip2(byte[] origin, User user) { System.out.println("---------------bzip2壓縮---------------"); // bzip2壓縮 byte[] bzip2Compress = Bzip2Utils.compress(origin); System.out.println("bzip2壓縮: " + bzip2Compress.length); byte[] bzip2Uncompress = Bzip2Utils.uncompress(bzip2Compress); System.out.println("bzip2解壓縮: " + bzip2Uncompress.length); User deUser = ProtostuffUtils.deserialize(bzip2Uncompress, User.class); System.out.println("對(duì)象是否相等: " + user.equals(deUser)); } private static void testDeflate(byte[] origin, User user) { System.out.println("---------------Deflate壓縮---------------"); // Deflate壓縮 byte[] deflateCompress = DeflateUtils.compress(origin); System.out.println("Deflate壓縮: " + deflateCompress.length); byte[] deflateUncompress = DeflateUtils.uncompress(deflateCompress); System.out.println("Deflate解壓縮: " + deflateUncompress.length); User deUser = ProtostuffUtils.deserialize(deflateUncompress, User.class); System.out.println("對(duì)象是否相等: " + user.equals(deUser)); } } 原始json字節(jié)數(shù): 5351 原始pb字節(jié)數(shù): 3850 ---------------GZIP壓縮--------------- Gzip壓縮: 2170 Gzip解壓縮: 3850 對(duì)象是否相等: true ---------------Snappy壓縮--------------- Snappy壓縮: 3396 Snappy解壓縮: 3850 對(duì)象是否相等: true ---------------Lz4壓縮--------------- Lz4壓縮: 3358 Lz4解壓縮: 3850 對(duì)象是否相等: true ---------------bzip2壓縮--------------- bzip2壓縮: 2119 bzip2解壓縮: 3850 對(duì)象是否相等: true ---------------Deflate壓縮--------------- Deflate壓縮: 2167 Deflate解壓縮: 3850 對(duì)象是否相等: true Process finished with exit code 0
8 總結(jié)
通過(guò)上面幾節(jié)的學(xué)習(xí),對(duì)lz4有了大致的了解,它的壓縮和解壓縮效率是非常好的,壓縮比相較于其他壓縮工具來(lái)講并不是很突出,其壓縮比取決于壓縮內(nèi)容的重復(fù)率。
在壓縮場(chǎng)景中,選擇合適的壓縮工具,各種壓縮工具均有其利弊,揚(yáng)其長(zhǎng)、避其短,才能使得我們的工作更有效。
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
Java 自動(dòng)安裝校驗(yàn)TLS/SSL證書
這篇文章主要介紹了Java 自動(dòng)安裝校驗(yàn)TLS/SSL證書的示例,幫助大家更好的理解和使用Java,感興趣的朋友可以了解下2020-10-10java基礎(chǔ)之String知識(shí)總結(jié)
今天帶大家來(lái)回顧一下Java基礎(chǔ),文中詳細(xì)總結(jié)了String的相關(guān)知識(shí),對(duì)正在學(xué)習(xí)java基礎(chǔ)的小伙伴們有很好的幫助,需要的朋友可以參考下2021-05-05SpringBoot自定義Starter與自動(dòng)配置實(shí)現(xiàn)方法詳解
在Spring Boot官網(wǎng)為了簡(jiǎn)化我們的開發(fā),已經(jīng)提供了非常多場(chǎng)景的Starter來(lái)為我們使用,即便如此,也無(wú)法全面的滿足我們實(shí)際工作中的開發(fā)場(chǎng)景,這時(shí)我們就需要自定義實(shí)現(xiàn)定制化的Starter2023-02-02jasypt 集成SpringBoot 數(shù)據(jù)庫(kù)密碼加密操作
這篇文章主要介紹了jasypt 集成SpringBoot 數(shù)據(jù)庫(kù)密碼加密操作,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2020-11-11歸并排序時(shí)間復(fù)雜度過(guò)程推導(dǎo)詳解
這篇文章主要介紹了C語(yǔ)言實(shí)現(xiàn)排序算法之歸并排序,對(duì)歸并排序的原理及實(shí)現(xiàn)過(guò)程做了非常詳細(xì)的解讀,需要的朋友可以參考下,希望能幫助到你2021-08-08