protobuf入门

jopen 10年前

protobuf全称Protocol Buffers，是google推出的一种高效，快捷的数据交换格式，和XML，Thrift一样，都是一种数据交换协议（当然thrift还提供rpc的功能）。protobuf相对与xml结构化的文本数据格式，它是一种二进制的数据格式，具有更高的传输，打包和解包效率，这也是为什么protobuf很受欢迎的原因。

protobuf通过自己的编译器，对协议文件进行编译，生成对应语言的代码，方便的进行数据的打包和解包。目前，google 提供了三种语言的实现：java、c++ 和python，每一种实现都包含了相应语言的编译器以及库文件。

下面介绍protobuf的语法，protobuf的IDL都是保存为*.proto的文件中，proto文件中数据类型可以分为两大类：复合数据类型和标准数据类型。复合数据类型包括：枚举和message类型，标准数据类型包含：整型，浮点，字符串等，后面会详细介绍。

message

最常用的数据格式就是message，例如一个订单数据可以用message表示如下：

message Order  {      required uint64 uid = 1;      required float cost = 2;      optional string tag = 3;  }

它经过protobuf编译成c++代码，会生成对应的XXX.pb.h和XXX.pb.cc。message会对应生成一个class，里面存放对应的data members，处理这些数据的函数，以及对应的打包和解包函数。

class Order : public ::google::protobuf::Message {   public:    ...    // accessors -------------------------------------------------------    ...      ::google::protobuf::uint64 uid_;    ::std::string* tag_;    float cost_;  };

message数据格式中需要知道的：

1.每个字段末尾赋值的tag：该tag是用来标记该字段在序列化后的二进制数据中所在的field，每个字段的tag在message内部都是独一无二的。也不能进行改变，否则数据就不能正确的解包。

2.数据类型前面的修饰词：

required: 必须赋值，不能为空，否则该条message会被认为是“uninitialized”。build一个“uninitialized” message会抛出一个RuntimeException异常，解析一条“uninitialized” message会抛出一条IOException异常。除此之外，“required”字段跟“optional”字段并无差别。
optional:字段可以赋值，也可以不赋值。假如没有赋值的话，会被赋上默认值。
repeated: 该字段可以重复任意次数，包括0次。重复数据的顺序将会保存在protocol buffer中，将这个字段想象成一个可以自动设置size的数组就可以了。

枚举

枚举和c++，java中的枚举类型是一个含义：

enum Corpus {   UNIVERSAL = 0;    WEB = 1;    IMAGES = 2;   LOCAL = 3;    NEWS = 4;    PRODUCTS = 5;   VIDEO = 6;  }

执行 protoc --cpp_out=. enum_test.proto，会生成以下c++代码

enum Corpus {    UNIVERSAL = 0,    WEB = 1,    IMAGES = 2,    LOCAL = 3,    NEWS = 4,    PRODUCTS = 5,    VIDEO = 6  };

基本数据类型

protobuf支持的基本数据类型如下图：

message详解：

message数据格式在c++中被protobuf自动编译包含一下内容：

//xxx.proto  message Order  {      required uint64 uid = 1;      required float cost = 2;      optional string tag = 3;  }    //xxx.pb.h  <pre name="code" class="cpp">class Order : public ::google::protobuf::Message {   public:    ...    // accessors -------------------------------------------------------      // required uint64 uid = 1;    inline bool has_uid() const;    inline void clear_uid();    static const int kUidFieldNumber = 1;    inline ::google::protobuf::uint64 uid() const;    inline void set_uid(::google::protobuf::uint64 value);      // required float cost = 2;    inline bool has_cost() const;    inline void clear_cost();    static const int kCostFieldNumber = 2;    inline float cost() const;    inline void set_cost(float value);      // optional string tag = 3;    inline bool has_tag() const;    inline void clear_tag();    static const int kTagFieldNumber = 3;    inline const ::std::string& tag() const;    inline void set_tag(const ::std::string& value);    inline void set_tag(const char* value);    inline void set_tag(const char* value, size_t size);    inline ::std::string* mutable_tag();    inline ::std::string* release_tag();    inline void set_allocated_tag(::std::string* tag);      // @@protoc_insertion_point(class_scope:Order)   private:    inline void set_has_uid();    inline void clear_has_uid();    inline void set_has_cost();    inline void clear_has_cost();    inline void set_has_tag();    inline void clear_has_tag();      ::google::protobuf::uint32 _has_bits_[1];      ::google::protobuf::uint64 uid_;    ::std::string* tag_;    float cost_;  };

对于每一个message的data member，protobuf会自动生成相关的处理函数，对于每一个字段主要的处理函数有：has_uid(), clear_uid(), uid(), set_uid()，它们分别用于判断该字段是否被设置，清除该字段设置记录，获得该字段，设置该字段。对于示例中的uid字段，对应函数的实现如下：

//xxx.pb.h    // required uint64 uid = 1;  inline bool Order::has_uid() const {    return (_has_bits_[0] & 0x00000001u) != 0;  }  inline void Order::set_has_uid() {    _has_bits_[0] |= 0x00000001u;  }  inline void Order::clear_has_uid() {    _has_bits_[0] &= ~0x00000001u;  }  inline void Order::clear_uid() {    uid_ = GOOGLE_ULONGLONG(0);    clear_has_uid();  }  inline ::google::protobuf::uint64 Order::uid() const {    // @@protoc_insertion_point(field_get:Order.uid)    return uid_;  }  inline void Order::set_uid(::google::protobuf::uint64 value) {    set_has_uid();    uid_ = value;    // @@protoc_insertion_point(field_set:Order.uid)  }

由实现代码可知，代码是通过_has_bits_来标记字段是否已经被设置，_has_bits_的定义如下：

::google::protobuf::uint32 _has_bits_[1];

通过_has_bits_的位来表达各个字段是否被设置。分别通过0x01, 0x02, 0x04...来分别标记第1,2,3，，，各个field是否已经被设置。

对于protobuf将协议数据序列化为二进制数据的接口有如下：

// Serialization ---------------------------------------------------    // Methods for serializing in protocol buffer format.  Most of these    // are just simple wrappers around ByteSize() and SerializeWithCachedSizes().      // Write a protocol buffer of this message to the given output.  Returns    // false on a write error.  If the message is missing required fields,    // this may GOOGLE_CHECK-fail.    bool SerializeToCodedStream(io::CodedOutputStream* output) const;    // Like SerializeToCodedStream(), but allows missing required fields.    bool SerializePartialToCodedStream(io::CodedOutputStream* output) const;      // Write the message to the given zero-copy output stream.  All required    // fields must be set.    bool SerializeToZeroCopyStream(io::ZeroCopyOutputStream* output) const;    bool SerializePartialToZeroCopyStream(io::ZeroCopyOutputStream* output) const;      // Serialize the message and store it in the given string.  All required    // fields must be set.    bool SerializeToString(string* output) const;    bool SerializePartialToString(string* output) const;      // Serialize the message and store it in the given byte array.  All required    // fields must be set.    bool SerializeToArray(void* data, int size) const;    bool SerializePartialToArray(void* data, int size) const;      string SerializeAsString() const;    string SerializePartialAsString() const;      // Like SerializeToString(), but appends to the data to the string's existing    // contents.  All required fields must be set.    bool AppendToString(string* output) const;    bool AppendPartialToString(string* output) const;      // Serialize the message and write it to the given file descriptor.  All    // required fields must be set.    bool SerializeToFileDescriptor(int file_descriptor) const;    bool SerializePartialToFileDescriptor(int file_descriptor) const;      // Serialize the message and write it to the given C++ ostream.  All    // required fields must be set.    bool SerializeToOstream(ostream* output) const;    bool SerializePartialToOstream(ostream* output) const;

protobuf入门

相关经验

目录