一文讀懂C++中Protobuf

更新時間：2023年05月17日 15:43:51 作者：令狐少俠、

Protocol Buffers 是一種輕便高效的結(jié)構(gòu)化數(shù)據(jù)存儲格式，可以用于結(jié)構(gòu)化數(shù)據(jù)串行化、或者說序列化，本文詳解了Protobuf的使用，具有一定的參考價值，感興趣的可以了解一下

簡介

Google Protocol Buffer( 簡稱 Protobuf) 是 Google 公司內(nèi)部的混合語言數(shù)據(jù)標(biāo)準(zhǔn)，目前已經(jīng)正在使用的有超過 48,162 種報文格式定義和超過 12,183 個 .proto 文件。他們用于 RPC 系統(tǒng)和持續(xù)數(shù)據(jù)存儲系統(tǒng)。

Protocol Buffers 是一種輕便高效的結(jié)構(gòu)化數(shù)據(jù)存儲格式，可以用于結(jié)構(gòu)化數(shù)據(jù)串行化、或者說序列化。它很適合做數(shù)據(jù)存儲或RPC數(shù)據(jù)交換格式。可以用于即時通訊、數(shù)據(jù)存儲等領(lǐng)域的語言無關(guān)、平臺無關(guān)、可擴展的序列化結(jié)構(gòu)數(shù)據(jù)格式

Google的Protobuf了，相比于它的前輩xml、json，它的體量更小，解析速度更快，所以在 IM 這種通信應(yīng)用中，非常適合將 Protobuf 作為數(shù)據(jù)傳輸格式。
protobuf的核心內(nèi)容包括：

定義消息：消息的結(jié)構(gòu)體，以message標(biāo)識。
定義接口：接口路徑和參數(shù)，以service標(biāo)識。

通過protobuf提供的機制，服務(wù)端與服務(wù)端之間只需要關(guān)注接口方法名（service）和參數(shù)（message）即可通信，而不需關(guān)注繁瑣的鏈路協(xié)議和字段解析，極大降低了服務(wù)端的設(shè)計開發(fā)成本。

查看版本

protoc --version    //查看版本

proto3 與 proto2 的區(qū)別

proto3 比 proto2 支持更多語言但更簡潔。去掉了一些復(fù)雜的語法和特性，更強調(diào)約定而弱化語法

在第一行非空白非注釋行，必須寫：syntax = “proto3”;
字段規(guī)則移除了 “required”，并把 “optional” 改名為 “singular”；
proto3 repeated標(biāo)量數(shù)值類型默認packed，而proto2默認不開啟
- 在 proto2 中，需要明確使用 [packed=true] 來為字段指定比較緊湊的 packed 編碼方式
語言增加 Go、Ruby、JavaNano 支持；
proto2可以選填default，而proto3只能使用系統(tǒng)默認的
- 在 proto2 中，可以使用 default 選項為某一字段指定默認值。在 proto3 中，字段的默認值只能根據(jù)字段類型由系統(tǒng)決定。也就是說，默認值全部是約定好的，而不再提供指定默認值的語法
proto3必須有一個零值，以便我們可以使用 0 作為數(shù)字默認值。零值需要是第一個元素，以便與proto2語義兼容，其中第一個枚舉值始終是默認值。proto2則沒有這項要求。
roto3在3.5版本之前會丟棄未知字段。但在 3.5 版本中，重新引入了未知字段的保留以匹配 proto2 行為。在 3.5 及更高版本中，未知字段在解析過程中保留并包含在序列化輸出中
proto3移除了proto2的擴展，新增了Any（仍在開發(fā)中）和JSON映射

定義數(shù)據(jù)結(jié)構(gòu)

syntax = "proto3";
message Person {
? ? string name = 1;
? ? int32 id = 2;
? ? string email = 3;
}

字段類型

Protobuf定義了一套基本數(shù)據(jù)類型

proto文件消息類型	C++ 類型	說明
double	double	雙精度浮點型
float	float	單精度浮點型
int32	int32	使用可變長編碼方式，負數(shù)時不夠高效，應(yīng)該使用sint32
int64	int64	使用可變長編碼方式，負數(shù)時不夠高效，應(yīng)該使用sint32
unit32	unit32	使用可變長編碼方式
unit64	unit64	使用可變長編碼方式
sint32	int32	使用可變長編碼方式，有符號的整型值，負數(shù)編碼時比通常的int32高效
sint64	sint64	使用可變長編碼方式，有符號的整型值，負數(shù)編碼時比通常的int64
fixed32	unit32	總是4個字節(jié)，如果數(shù)值總是比2^28大的話，這個類型會比uint32高效
fixed64	unit64	總是8個字節(jié)，如果數(shù)值總是比2^56大的話，這個類型會比uint64高效
sfixed32	int32	總是4個字節(jié)
sfixed64	int64	總是8個字節(jié)
bool	bool	布爾類型
string	string	一個字符串必須是utf-8編碼或者7-bit的ascii編碼的文本
bytes	string	可能包含任意順序的字節(jié)數(shù)據(jù)

字段編號

消息定義中的每個字段都有一個唯一的編號。這些字段編號用于以二進制格式標(biāo)識您的字段，一旦您的消息類型被使用，就不應(yīng)該被更改。

Tag的取值范圍最小是1,最大是229229-1,但但 19000~19999 是 protobuf 預(yù)留的，用戶不能使用。

雖然編號的定義范圍比較大，但不同編號也會對 protobuf 編碼帶來一些影響：

1 ~ 15：單字節(jié)編碼
16 ~ 2047：雙字節(jié)編碼

使用頻率高的變量最好設(shè)置為1~15，這樣可以減少編碼后的數(shù)據(jù)大小，但由于編號一旦指定不能修改，所以為了以后擴展，也記得為未來保留一些 1~15 的編號

字段規(guī)則

singular: 可以有零個或其中一個字段(但不超過一個)。
repeated: 該字段可以重復(fù)任意次數(shù)(包括零次)。重復(fù)值的順序?qū)⒈槐Ａ簟?/li>

在proto 3中，可擴展的repeated字段為數(shù)字類型的默認編碼。

在proto2中，規(guī)則為：

required：必須有一個
optional：0或者1個
repeated：任意數(shù)量（包括0）

添加更多消息類型

可以在單個.proto中定義多種消息類型。如果您要定義多個相關(guān)消息，這很有用——例如，如果您想定義與搜索響應(yīng)消息類型相對應(yīng)的回復(fù)消息格式，可以將其添加到該.proto中:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}
message SearchResponse {
 ...
}

添加注釋

proto 添加注釋,使用 C/C++風(fēng)格的 // 或者 /* … */ 語法.

保留字段

如果通過完全刪除某個字段或?qū)ζ溥M行注釋來更新消息類型，將來的用戶可以在對該類型進行自己的更新時重用該字段編號。如果他們以后加載舊版本的相同.proto文件，這可能會導(dǎo)致嚴(yán)重的問題，包括數(shù)據(jù)損壞、隱私漏洞等。
可以把它的變量名或字段編號用 reserved 標(biāo)注，這樣，當(dāng)這個 Tag 或者變量名字被重新使用的時候，編譯器會報錯

message Foo {
    // 注意，同一個 reserved 語句不能同時包含變量名和 Tag 
    reserved 2, 15, 9 to 11;
    reserved "foo", "bar";
}

默認值

當(dāng)解析 message 時，如果被編碼的 message 里沒有包含某些變量，那么根據(jù)類型不同，他們會有不同的默認值：

string：默認是空的字符串
byte：默認是空的bytes
bool：默認為false
numeric：默認為0
enums：定義在第一位的枚舉值，也就是0
messages：根據(jù)生成的不同語言有不同的表現(xiàn)

收到數(shù)據(jù)后反序列化后，對于標(biāo)準(zhǔn)值類型的數(shù)據(jù)，比如bool，如果它的值是 false，那么我們無法判斷這個值是對方設(shè)置的，還是對方壓根就沒給這個變量設(shè)置值。

定義枚舉

在 protobuf 中，我們也可以定義枚舉，并且使用該枚舉類型，比如：

message SearchRequest {
    string query = 1;
    int32 page_number = 2; // Which page number do we want
    int32 result_per_page = 3; // Number of results to return per page
    enum Corpus {
        UNIVERSAL = 0;
        WEB = 1;
        IMAGES = 2;
        LOCAL = 3;
        NEWS = 4;
        PRODUCTS = 5;
        VIDEO = 6;
    }
    Corpus corpus = 4;
}

枚舉定義在一個消息內(nèi)部或消息外部都是可以的，如果枚舉是定義在 message 內(nèi)部，而其他 message 又想使用，那么可以通過 MessageType.EnumType 的方式引用。定義枚舉的時候，我們要保證第一個枚舉值必須是0，枚舉值不能重復(fù)，除非使用 option allow_alias = true 選項來開啟別名。如：

enum EnumAllowingAlias {
    option allow_alias = true;
    UNKNOWN = 0;
    STARTED = 1;
    RUNNING = 1;
}

枚舉值的范圍是32-bit integer，但因為枚舉值使用變長編碼，所以不推薦使用負數(shù)作為枚舉值，因為這會帶來效率問題

編譯.proto 文件

在.proto 文件中定義了數(shù)據(jù)結(jié)構(gòu)，這些數(shù)據(jù)結(jié)構(gòu)是面向開發(fā)者和業(yè)務(wù)程序的，并不面向存儲和傳輸。當(dāng)需要把這些數(shù)據(jù)進行存儲或傳輸時，就需要將這些結(jié)構(gòu)數(shù)據(jù)進行序列化、反序列化以及讀寫。ProtoBuf 提供相應(yīng)的接口代碼，可以通過 protoc 這個編譯器來生成相應(yīng)的接口代碼，命令如下：

protoc編譯

protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/xxx.proto
# $SRC_DIR: .proto 所在的源目錄
# --cpp_out: 生成 c++ 代碼
# $DST_DIR: 生成代碼的目標(biāo)目錄
# xxx.proto: 要針對哪個 proto 文件生成接口代碼

cmake編譯

protobuf_generate_cpp

find_package(Protobuf REQUIRED)
include_directories(${Protobuf_INCLUDE_DIRS})
# pb.cc文件路徑 pb.h文件路徑
protobuf_generate_cpp(PROTO_SRCS PROTO_HDRS person.proto)

有兩個缺點:

要求protobuf_generate_cpp命令和生成add_executable() 或 add_library() 的命令必須在同一個CMakeList中.
無法設(shè)置源碼的生成路徑,只能默認在相應(yīng)的build中生成

execute_process

以使用cmake中的execute_process命令調(diào)用protoc程序來自定義生成源碼的路徑

find_package(Protobuf REQUIRED)
include_directories(${Protobuf_INCLUDE_DIRS})
execute_process(COMMAND ${PROTOBUF_PROTOC_EXECUTABLE} -I=${PROJECT_SOURCE_DIR}/proto/ --cpp_out=${PROJECT_SOURCE_DIR}/  ${PROJECT_SOURCE_DIR}/proto/xxx.proto)
add_executable(file main.cpp ${PROJECT_SOURCE_DIR}/xxx.pb.cc file.cpp)

這種方法仍然存在缺點:每次執(zhí)行cmake后,都會重新生成proto源碼,導(dǎo)致make時會因為源碼變動(內(nèi)容未變,只是重新生成)而重新編譯程序

示例1：定義proto

定義proto

syntax = "proto3";
// 聲明是為了防止不同項目之間的命名沖突,編譯生成的類將被放置在一個與 package 名相同的命名空間中。
package tutorial;
message Student {
?? ?// 字段編號:消息定義中的每個字段都有一個唯一的編號。這些字段編號用于以二進制格式標(biāo)識您的字段，一旦您的消息類型被使用，就不應(yīng)該被更改
?? ?uint64 id = 1;
?? ?string name = 2;
? ? // singular修飾符修飾的字段可以是0次或者1次。但是當(dāng)定制協(xié)議，用該修飾符修飾的字段都報錯
?? ?// singular string email = 3;
?? ?string email = 3;
?? ?enum PhoneType {
?? ??? ?MOBILE ?? ?= 0; //proto3版本中，首成員必須為0，成員不應(yīng)有相同的值
?? ??? ?HOME ?? ?= 1;
?? ?}
?? ?message PhoneNumber {?
?? ??? ?string number ?? ?= 1;
?? ? ? ?PhoneType type = 2;
?? ?}
?? ?// repeated: 該字段可以重復(fù)任意次數(shù)(包括零次)。重復(fù)值的順序?qū)⒈槐Ａ?
?? ?repeated PhoneNumber phone = 4;
}

Protobuf API

看看讀寫類，編譯器為每個字段生成讀寫函數(shù)

? // optional uint64 id = 1;
? void clear_id();
? static const int kIdFieldNumber = 1;
? ::google::protobuf::uint64 id() const;
? void set_id(::google::protobuf::uint64 value);
? // optional string name = 2;
? void clear_name();
? static const int kNameFieldNumber = 2;
? const ::std::string& name() const;
? void set_name(const ::std::string& value);
? void set_name(const char* value);
? void set_name(const char* value, size_t size);
? ::std::string* mutable_name();
? ::std::string* release_name();
? void set_allocated_name(::std::string* name);
? // optional string email = 3;
? void clear_email();
? static const int kEmailFieldNumber = 3;
? const ::std::string& email() const;
? void set_email(const ::std::string& value);
? void set_email(const char* value);
? void set_email(const char* value, size_t size);
? ::std::string* mutable_email();
? ::std::string* release_email();
? void set_allocated_email(::std::string* email);
? // repeated .tutorial.Student.PhoneNumber phone = 4;
? int phone_size() const;
? void clear_phone();
? static const int kPhoneFieldNumber = 4;
? const ::tutorial::Student_PhoneNumber& phone(int index) const;
? ::tutorial::Student_PhoneNumber* mutable_phone(int index);
? ::tutorial::Student_PhoneNumber* add_phone();
? ::google::protobuf::RepeatedPtrField< ::tutorial::Student_PhoneNumber >*
? ? ? mutable_phone();
? const ::google::protobuf::RepeatedPtrField< ::tutorial::Student_PhoneNumber >&
? ? ? phone() const;

基本函數(shù)：

set_*函數(shù)：設(shè)置字段值
clear_*函數(shù)：用來將字段重置到空狀態(tài)

數(shù)值類型的字段 id 就只有基本讀寫函數(shù)，string類型的name和email有額外的函數(shù)：

mutable_*函數(shù)：數(shù)返回 string 的直接指針

重復(fù)的字段也有一些特殊的函數(shù)——如果你看一下重復(fù)字段 phone 的那些函數(shù)，就會發(fā)現(xiàn)你可以：

得到重復(fù)字段的 _size（Person 關(guān)聯(lián)了多少個電話號碼）。
通過索引（index）來獲取一個指定的電話號碼。
mutable_phone函數(shù)：通過指定的索引（index）來更新一個已經(jīng)存在的電話號碼。
add_phone函數(shù)：向消息（message）中添加另一個電話號碼

標(biāo)準(zhǔn)消息函數(shù)

  void CopyFrom(const Student& from);
  void MergeFrom(const Student& from);
  void Clear();
  bool IsInitialized() const;

序列化和反序列化

bool SerializeToString(string* output) const; //將消息序列化并儲存在指定的string中。注意里面的內(nèi)容是二進制的，而不是文本；我們只是使用string作為一個很方便的容器。
bool ParseFromString(const string& data); //從給定的string解析消息。
bool SerializeToArray(void * data, int size) const?? ?//將消息序列化至數(shù)組
bool ParseFromArray(const void * data, int size)?? ?//從數(shù)組解析消息
bool SerializeToOstream(ostream* output) const; //將消息寫入到給定的C++ ostream中。
bool ParseFromIstream(istream* input); //從給定的C++ istream解析消息。

實例2：proto文件讀寫

下面演示一個簡單例子，讀寫函數(shù)已經(jīng)封裝好了，大家可以自行調(diào)用！

config.conf：

pfe_file: "pfe.trt"
rpn_file: "rpn.trt"

pointpillars_config.proto：

syntax = "proto3";
message PointPillarsConfig {
? string pfe_file = 1;
? string rpn_file = 2;
}

main.cpp

#include <iostream>
#include <string>
#include "config.pb.h"
#include "file.h"
bool SetProtoToASCIIFile(const google::protobuf::Message &message,
? ? ? ? ? ? ? ? ? ? ? ? ?int file_descriptor) {
? using google::protobuf::TextFormat;
? using google::protobuf::io::FileOutputStream;
? using google::protobuf::io::ZeroCopyOutputStream;
? if (file_descriptor < 0) {
? ? std::cout << "Invalid file descriptor.";
? ? return false;
? }
? ZeroCopyOutputStream *output = new FileOutputStream(file_descriptor);
? bool success = TextFormat::Print(message, output);
? delete output;
? close(file_descriptor);
? return success;
}
bool GetProtoFromASCIIFile(const std::string& file_name,
? ? google::protobuf::Message* message) {
? ? using google::protobuf::TextFormat;
? ? using google::protobuf::io::FileInputStream;
? ? using google::protobuf::io::ZeroCopyInputStream;
? ? int file_descriptor = open(file_name.c_str(), O_RDONLY);
? ? if (file_descriptor < 0) {
? ? ? ? std::cout << "Failed to open file " << file_name << " in text mode.";
? ? ? ? // Failed to open;
? ? ? ? return false;
? ? }
? ? ZeroCopyInputStream* input = new FileInputStream(file_descriptor);
? ? bool success = TextFormat::Parse(input, message);
? ? if (!success) {
? ? ? ? std::cout << "Failed to parse file " << file_name << " as text proto.";
? ? }
? ? delete input;
? ? close(file_descriptor);
? ? return success;
}
int main(int argc, char *argv[])
{
? ? // 將此宏放在main函數(shù)中(使用 protobuf 庫之前的某個位置), 以驗證您鏈接的版本是否與您編譯的頭文件匹配。 如果檢測到版本不匹配，該過程將中止
? ? GOOGLE_PROTOBUF_VERIFY_VERSION;
? ? PointPillarsConfig config;
? ? std::string config_file = "../config/pointpillars.conf";
? ? GetProtoFromFile(config_file, &config);
? ? std::cout << config.pfe_file() << std::endl;
? ? google::protobuf::ShutdownProtobufLibrary();
}

CMakeLists.txt：

CMAKE_MINIMUM_REQUIRED(VERSION 3.10)
project(file)
find_package(Protobuf REQUIRED)
include_directories(
? ? ${Protobuf_INCLUDE_DIRS}
? ? ${GLOB_INCLUDE_DIRS}
? ? ${CMAKE_CURRENT_BINARY_DIR}
)
# protobuf_generate_cpp(PROTO_SRCS PROTO_HDRS proto/pointpillars_config.proto)
# add_executable(file main.cpp ${PROTO_SRCS} file.cpp)
execute_process(COMMAND ${PROTOBUF_PROTOC_EXECUTABLE} -I=${PROJECT_SOURCE_DIR}/proto/ --cpp_out=${PROJECT_SOURCE_DIR}/ ?${PROJECT_SOURCE_DIR}/proto/pointpillars_config.proto)
add_executable(file main.cpp ${PROJECT_SOURCE_DIR}/pointpillars_config.pb.cc file.cpp)
target_link_libraries(file ${Protobuf_LIBRARIES} )

上面是從文件讀數(shù)據(jù)寫入proto，SetProtoToASCIIFile為把proto數(shù)據(jù)寫入文件的函數(shù)，大家可以自行調(diào)用