快捷導(dǎo)航

C++開發(fā)protobuf動態(tài)解析工具

更新時間：2023年01月03日 15:32:58 作者：碼小方

這篇文章主要為大家介紹了C++開發(fā)protobuf動態(tài)解析工具實(shí)現(xiàn)示例詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

為什么需要這個工具

數(shù)據(jù)庫中存儲的protobuf序列化的內(nèi)容，有時候查問題想直接解析查看內(nèi)容。很多編碼在網(wǎng)上很容易找到編解碼工具，但protobuf沒有找到編解碼工具，可能這樣的需求比較少吧，那就自己用C++實(shí)現(xiàn)一個。

需求描述

我們知道，要解析protobuf，需要有proto定義，所以我們的輸入?yún)?shù)需要包含序列化的數(shù)據(jù)以及proto定義，如果proto中包含多個message，還需要指定解析到哪個message。所以一共是三個輸入?yún)?shù)。

此外，為了方便使用，我們的工具不要求給出完整的proto定義，如果有嵌套的message沒有定義，不應(yīng)影響其他字段解析。

搜索現(xiàn)成方案

網(wǎng)上搜索了一圈，找到的類似方案大多需要導(dǎo)入完整的proto文件：

int DynamicParseFromPBFile(const std::string& file, const std::string& classname, 
      const std::string& pb_str) {
  // ...
  // 導(dǎo)入proto文件
  ::google::protobuf::compiler::Importer importer(&sourceTree, NULL);
  importer.Import(file);
  // 找到要解析的message
  auto descriptor = importer.pool()->FindMessageTypeByName(classname);
  ::google::protobuf::DynamicMessageFactory factory;
  auto message = factory.GetPrototype(descriptor);
  // 動態(tài)創(chuàng)建message對象
  auto msg = message->New();
  msg->ParseFromString(pb_str);
  // msg即為解析到的結(jié)構(gòu)
}

這樣可以實(shí)現(xiàn)動態(tài)解析，但仍不滿足我們的需求——即使proto不完整，也希望能解析。

舉個例子：

message MyMsg {
  optional uint64 id = 1;
  optional OtherMsg other = 2;
}

MyMsg中包含OtherMsg類型，但并沒有給出OtherMsg的定義，所以無法正常解析。

AST在哪里

事實(shí)上，在解析proto文件時，肯定需要先將其解析為抽象語法樹(AST)，在AST中，我們可以很容易修改proto的定義，例如將other字段刪掉，或者將其類型改為bytes，這樣就可以正常解析了。

那么，proto文件解析成的AST結(jié)構(gòu)在哪里呢？只能從源碼中尋找答案了。

一番查找后，終于看到了FindFileByName方法的這段代碼：

bool SourceTreeDescriptorDatabase::FindFileByName(const std::string& filename,
                                                  FileDescriptorProto* output) {
  // ...
  io::Tokenizer tokenizer(input.get(), &file_error_collector);
  Parser parser;
  // Parse it.
  output->set_name(filename);
  return parser.Parse(&tokenizer, output) && !file_error_collector.had_errors();
}

從這段代碼中可以看到，F(xiàn)ileDescriptorProto就是我們要找的AST結(jié)構(gòu)。那么這到底是個什么結(jié)構(gòu)呢？

其實(shí)，F(xiàn)ileDescriptorProto本身也是一個proto定義的message：

message FileDescriptorProto {
  optional string name = 1;     // file name, relative to root of source tree
  optional string package = 2;  // e.g. "foo", "foo.bar", etc.
  // All top-level definitions in this file.
  repeated DescriptorProto message_type = 4;
  repeated EnumDescriptorProto enum_type = 5;
  repeated ServiceDescriptorProto service = 6;
  repeated FieldDescriptorProto extension = 7;
  // ...
}

從它的字段中可以看到，其代表的是整個proto文件，包括文件中的所有message、enum等定義。

開始寫代碼

第一步

仿照上面的源碼，將輸入的proto定義解析為FileDescriptorProto對象：

// proto輸入
istringstream ss(proto);
istream* is = &ss;
io::IstreamInputStream input(is);
// 解析到FileDescriptorProto AST
io::Tokenizer tokenizer(&input, nullptr);
FileDescriptorProto output;
compiler::Parser parser;
if (!parser.Parse(&tokenizer, &output)) {
  err_msg = "parse proto failed";
  return -1;
}
output.set_name("proto");
output.clear_source_code_info();
printf("MSG: proto parsed output: %s\n", output.DebugString().c_str());

第2步

處理FileDescriptorProto對象，將沒有給定義的字段類型都改成bytes，保證proto可以正常解析：

int ConvertUnknownType2Bytes(FileDescriptorProto& file_descriptor_proto) {
  // 找出所有給出定義的message類型名
  set<string> typename_set;
  for (auto const& msgtype : file_descriptor_proto.message_type()) {
    typename_set.insert(msgtype.name());
    // message內(nèi)嵌套定義的message也要包含在內(nèi)
    for (auto const& subtype : msgtype.nested_type()) {
      typename_set.insert(subtype.name());
    }
  }
  // 遍歷所有field，檢查其類型是否存在定義
  for (auto& msgtype : *file_descriptor_proto.mutable_message_type()) {
    for (auto& field : *msgtype.mutable_field()) {
      auto type_name = field.type_name();
      // 基本類型的type_name是空的
      if (!type_name.empty()) {
        // 如果typename_set中找不到該類型名，則轉(zhuǎn)為bytes類型
        if (typename_set.find(type_name) == typename_set.end()) {
          field.clear_type_name();
          field.set_type(FieldDescriptorProto_Type_TYPE_BYTES);
        }
      }
    }
  }
  return 0;
}

第3步

解析修改后的FileDescriptorProto對象，創(chuàng)建指定message類型對象。

// 解析proto并檢查錯誤
SimpleDescriptorDatabase db;
db.Add(output);
DescriptorPool pool(&db);
auto descriptor = pool.FindMessageTypeByName(msg_type_name);
if (descriptor == nullptr) {
  // proto結(jié)構(gòu)有錯
  err_msg = "parse proto failed. FindMessageTypeByName result is null";
  return -1;
}
DynamicMessageFactory factory;
auto message = factory.GetPrototype(descriptor);
unique_ptr<Message> msg(message->New());

第4步

將序列化的數(shù)據(jù)解析到msg中：

msg->ParseFromString(serilized_pb);
cout << "proto msg: " << msg->ShortDebugString().c_str() << endl;

這樣，我們就成功實(shí)現(xiàn)了動態(tài)解析，也成功將不可讀的二進(jìn)制數(shù)據(jù)serilized_pb以可讀的形式打印出來了。

總結(jié)

我們?yōu)榱藢?shí)現(xiàn)動態(tài)解析不完整的proto，我們首先從源碼中找到了將proto定義轉(zhuǎn)化為AST——也就是FileDescriptorProto——的方法。

接著，我們將AST對象進(jìn)行修改，將不合法的proto改成合法的。

最后，我們再利用修改后的FileDescriptorProto構(gòu)造出需要的message對象，解析序列化的數(shù)據(jù)。

以上就是C++開發(fā)protobuf動態(tài)解析工具的詳細(xì)內(nèi)容，更多關(guān)于C++ protobuf動態(tài)解析工具的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

C++開發(fā)protobuf動態(tài)解析工具

目錄

為什么需要這個工具

需求描述

搜索現(xiàn)成方案

AST在哪里

開始寫代碼

第一步

第2步

第3步

第4步

總結(jié)

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具