Java 中的協議緩沖區分隔 I/O 函數是否有 C++ 等效項？

Question

我正在嘗試使用 C++ 和 Java 從文件中讀取/寫入多個協議緩沖區消息。 Google 建議在消息之前編寫長度前綴，但默認情況下沒有辦法做到這一點（我可以看到）。

但是，2.1.0 版中的 Java API 收到了一組“分隔的”I/O 函數，它們顯然可以完成這項工作：

parseDelimitedFrom
mergeDelimitedFrom
writeDelimitedTo

有 C++ 等價物嗎？ 如果沒有，Java API 附加的大小前綴的有線格式是什么，以便我可以在 C++ 中解析這些消息？

更新：

從 v3.3.0 開始，這些現在存在於google/protobuf/util/delimited_message_util.h中。

Answer 1

我在這里參加聚會有點晚了，但是下面的實現包括其他答案中缺少的一些優化，並且在輸入 64MB 后不會失敗（盡管它仍然對每條消息強制執行64MB 限制，只是不在整個流上）。

（我是 C++ 和 Java protobuf 庫的作者，但我不再為 Google 工作。抱歉，這段代碼從未進入官方庫。這就是它的樣子。）

bool writeDelimitedTo(
    const google::protobuf::MessageLite& message,
    google::protobuf::io::ZeroCopyOutputStream* rawOutput) {
  // We create a new coded stream for each message.  Don't worry, this is fast.
  google::protobuf::io::CodedOutputStream output(rawOutput);

  // Write the size.
  const int size = message.ByteSize();
  output.WriteVarint32(size);

  uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
  if (buffer != NULL) {
    // Optimization:  The message fits in one buffer, so use the faster
    // direct-to-array serialization path.
    message.SerializeWithCachedSizesToArray(buffer);
  } else {
    // Slightly-slower path when the message is multiple buffers.
    message.SerializeWithCachedSizes(&output);
    if (output.HadError()) return false;
  }

  return true;
}

bool readDelimitedFrom(
    google::protobuf::io::ZeroCopyInputStream* rawInput,
    google::protobuf::MessageLite* message) {
  // We create a new coded stream for each message.  Don't worry, this is fast,
  // and it makes sure the 64MB total size limit is imposed per-message rather
  // than on the whole stream.  (See the CodedInputStream interface for more
  // info on this limit.)
  google::protobuf::io::CodedInputStream input(rawInput);

  // Read the size.
  uint32_t size;
  if (!input.ReadVarint32(&size)) return false;

  // Tell the stream not to read beyond that size.
  google::protobuf::io::CodedInputStream::Limit limit =
      input.PushLimit(size);

  // Parse the message.
  if (!message->MergeFromCodedStream(&input)) return false;
  if (!input.ConsumedEntireMessage()) return false;

  // Release the limit.
  input.PopLimit(limit);

  return true;
}

Answer 2

好的，所以我一直無法找到實現我需要的頂級 C++ 函數，但是通過 Java API 參考進行了一些探索，在MessageLite界面中發現了以下內容：

void writeDelimitedTo(OutputStream output)
/*  Like writeTo(OutputStream), but writes the size of 
    the message as a varint before writing the data.   */

所以 Java 大小前綴是一個 (Protocol Buffers) varint！

有了這些信息，我深入研究了 C++ API 並找到了CodedStream標頭，其中包含以下內容：

bool CodedInputStream::ReadVarint32(uint32 * value)
void CodedOutputStream::WriteVarint32(uint32 value)

使用這些，我應該能夠推出自己的 C++ 函數來完成這項工作。

不過，他們真的應該將其添加到主消息 API 中； 考慮到 Java 具有它，它缺少功能，Marc Gravell 出色的 protobuf-net C# 端口（通過 SerializeWithLengthPrefix 和 DeserializeWithLengthPrefix）也是如此。

Answer 3

我解決了同樣的問題，使用 CodedOutputStream/ArrayOutputStream 寫入消息（帶大小）和 CodedInputStream/ArrayInputStream 讀取消息（帶大小）。

例如，以下偽代碼在消息后面寫入消息大小：

const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;

google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);

codedOutput.WriteLittleEndian32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);

寫入時，您還應該檢查緩沖區是否足夠大以適合消息（包括大小）。 並且在閱讀時，您應該檢查您的緩沖區是否包含完整的消息（包括大小）。

如果他們向 C++ API 添加類似於 Java API 提供的便利方法，那肯定會很方便。

Answer 4

IsteamInputStream 對 eofs 和其他錯誤非常脆弱，當與 std::istream 一起使用時很容易發生這些錯誤。 在此之后，protobuf 流被永久損壞，任何已經使用的緩沖區數據都將被破壞。 從 protobuf 中的傳統流讀取有適當的支持。

實現google::protobuf::io::CopyingInputStream並將其與CopyingInputStreamAdapter一起使用。 對輸出變量執行相同操作。

實際上，解析調用以google::protobuf::io::CopyingInputStream::Read(void* buffer, int size) ，其中給出了緩沖區。 唯一剩下要做的就是以某種方式閱讀它。

下面是一個與 Asio 同步流（ SyncReadStream / SyncWriteStream ）一起使用的示例：

#include <google/protobuf/io/zero_copy_stream_impl_lite.h>

using namespace google::protobuf::io;


template <typename SyncReadStream>
class AsioInputStream : public CopyingInputStream {
    public:
        AsioInputStream(SyncReadStream& sock);
        int Read(void* buffer, int size);
    private:
        SyncReadStream& m_Socket;
};


template <typename SyncReadStream>
AsioInputStream<SyncReadStream>::AsioInputStream(SyncReadStream& sock) :
    m_Socket(sock) {}


template <typename SyncReadStream>
int
AsioInputStream<SyncReadStream>::Read(void* buffer, int size)
{
    std::size_t bytes_read;
    boost::system::error_code ec;
    bytes_read = m_Socket.read_some(boost::asio::buffer(buffer, size), ec);

    if(!ec) {
        return bytes_read;
    } else if (ec == boost::asio::error::eof) {
        return 0;
    } else {
        return -1;
    }
}


template <typename SyncWriteStream>
class AsioOutputStream : public CopyingOutputStream {
    public:
        AsioOutputStream(SyncWriteStream& sock);
        bool Write(const void* buffer, int size);
    private:
        SyncWriteStream& m_Socket;
};


template <typename SyncWriteStream>
AsioOutputStream<SyncWriteStream>::AsioOutputStream(SyncWriteStream& sock) :
    m_Socket(sock) {}


template <typename SyncWriteStream>
bool
AsioOutputStream<SyncWriteStream>::Write(const void* buffer, int size)
{   
    boost::system::error_code ec;
    m_Socket.write_some(boost::asio::buffer(buffer, size), ec);
    return !ec;
}

用法：

AsioInputStream<boost::asio::ip::tcp::socket> ais(m_Socket); // Where m_Socket is a instance of boost::asio::ip::tcp::socket
CopyingInputStreamAdaptor cis_adp(&ais);
CodedInputStream cis(&cis_adp);

Message protoMessage;
uint32_t msg_size;

/* Read message size */
if(!cis.ReadVarint32(&msg_size)) {
    // Handle error
 }

/* Make sure not to read beyond limit of message */
CodedInputStream::Limit msg_limit = cis.PushLimit(msg_size);
if(!msg.ParseFromCodedStream(&cis)) {
    // Handle error
}

/* Remove limit */
cis.PopLimit(msg_limit);

Answer 5

干得好：

#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/io/coded_stream.h>

using namespace google::protobuf::io;

class FASWriter 
{
    std::ofstream mFs;
    OstreamOutputStream *_OstreamOutputStream;
    CodedOutputStream *_CodedOutputStream;
public:
    FASWriter(const std::string &file) : mFs(file,std::ios::out | std::ios::binary)
    {
        assert(mFs.good());

        _OstreamOutputStream = new OstreamOutputStream(&mFs);
        _CodedOutputStream = new CodedOutputStream(_OstreamOutputStream);
    }

    inline void operator()(const ::google::protobuf::Message &msg)
    {
        _CodedOutputStream->WriteVarint32(msg.ByteSize());

        if ( !msg.SerializeToCodedStream(_CodedOutputStream) )
            std::cout << "SerializeToCodedStream error " << std::endl;
    }

    ~FASWriter()
    {
        delete _CodedOutputStream;
        delete _OstreamOutputStream;
        mFs.close();
    }
};

class FASReader
{
    std::ifstream mFs;

    IstreamInputStream *_IstreamInputStream;
    CodedInputStream *_CodedInputStream;
public:
    FASReader(const std::string &file), mFs(file,std::ios::in | std::ios::binary)
    {
        assert(mFs.good());

        _IstreamInputStream = new IstreamInputStream(&mFs);
        _CodedInputStream = new CodedInputStream(_IstreamInputStream);      
    }

    template<class T>
    bool ReadNext()
    {
        T msg;
        unsigned __int32 size;

        bool ret;
        if ( ret = _CodedInputStream->ReadVarint32(&size) )
        {   
            CodedInputStream::Limit msgLimit = _CodedInputStream->PushLimit(size);
            if ( ret = msg.ParseFromCodedStream(_CodedInputStream) )
            {
                _CodedInputStream->PopLimit(msgLimit);      
                std::cout << mFeed << " FASReader ReadNext: " << msg.DebugString() << std::endl;
            }
        }

        return ret;
    }

    ~FASReader()
    {
        delete _CodedInputStream;
        delete _IstreamInputStream;
        mFs.close();
    }
};

Answer 6

我在 C++ 和 Python 中都遇到了同樣的問題。

對於 C++ 版本，我混合使用了 Kenton Varda 在此線程上發布的代碼和他發送給 protobuf 團隊的拉取請求中的代碼（因為這里發布的版本不處理 EOF，而他發送到 github 的版本可以）。

#include <google/protobuf/message_lite.h>
#include <google/protobuf/io/zero_copy_stream.h>
#include <google/protobuf/io/coded_stream.h>


bool writeDelimitedTo(const google::protobuf::MessageLite& message,
    google::protobuf::io::ZeroCopyOutputStream* rawOutput)
{
    // We create a new coded stream for each message.  Don't worry, this is fast.
    google::protobuf::io::CodedOutputStream output(rawOutput);

    // Write the size.
    const int size = message.ByteSize();
    output.WriteVarint32(size);

    uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
    if (buffer != NULL)
    {
        // Optimization:  The message fits in one buffer, so use the faster
        // direct-to-array serialization path.
        message.SerializeWithCachedSizesToArray(buffer);
    }

    else
    {
        // Slightly-slower path when the message is multiple buffers.
        message.SerializeWithCachedSizes(&output);
        if (output.HadError())
            return false;
    }

    return true;
}

bool readDelimitedFrom(google::protobuf::io::ZeroCopyInputStream* rawInput, google::protobuf::MessageLite* message, bool* clean_eof)
{
    // We create a new coded stream for each message.  Don't worry, this is fast,
    // and it makes sure the 64MB total size limit is imposed per-message rather
    // than on the whole stream.  (See the CodedInputStream interface for more
    // info on this limit.)
    google::protobuf::io::CodedInputStream input(rawInput);
    const int start = input.CurrentPosition();
    if (clean_eof)
        *clean_eof = false;


    // Read the size.
    uint32_t size;
    if (!input.ReadVarint32(&size))
    {
        if (clean_eof)
            *clean_eof = input.CurrentPosition() == start;
        return false;
    }
    // Tell the stream not to read beyond that size.
    google::protobuf::io::CodedInputStream::Limit limit = input.PushLimit(size);

    // Parse the message.
    if (!message->MergeFromCodedStream(&input)) return false;
    if (!input.ConsumedEntireMessage()) return false;

    // Release the limit.
    input.PopLimit(limit);

    return true;
}

這是我的python2實現：

from google.protobuf.internal import encoder
from google.protobuf.internal import decoder

#I had to implement this because the tools in google.protobuf.internal.decoder
#read from a buffer, not from a file-like objcet
def readRawVarint32(stream):
    mask = 0x80 # (1 << 7)
    raw_varint32 = []
    while 1:
        b = stream.read(1)
        #eof
        if b == "":
            break
        raw_varint32.append(b)
        if not (ord(b) & mask):
            #we found a byte starting with a 0, which means it's the last byte of this varint
            break
    return raw_varint32

def writeDelimitedTo(message, stream):
    message_str = message.SerializeToString()
    delimiter = encoder._VarintBytes(len(message_str))
    stream.write(delimiter + message_str)

def readDelimitedFrom(MessageType, stream):
    raw_varint32 = readRawVarint32(stream)
    message = None

    if raw_varint32:
        size, _ = decoder._DecodeVarint32(raw_varint32, 0)

        data = stream.read(size)
        if len(data) < size:
            raise Exception("Unexpected end of file")

        message = MessageType()
        message.ParseFromString(data)

    return message

#In place version that takes an already built protobuf object
#In my tests, this is around 20% faster than the other version 
#of readDelimitedFrom()
def readDelimitedFrom_inplace(message, stream):
    raw_varint32 = readRawVarint32(stream)

    if raw_varint32:
        size, _ = decoder._DecodeVarint32(raw_varint32, 0)

        data = stream.read(size)
        if len(data) < size:
            raise Exception("Unexpected end of file")

        message.ParseFromString(data)

        return message
    else:
        return None

它可能不是最好看的代碼，我相信它可以重構相當多，但至少應該向您展示一種方法。

現在最大的問題是：它很慢。

即使使用 python-protobuf 的 C++ 實現，它也比純 C++ 慢一個數量級。 我有一個基准測試，我從文件中讀取 10M protobuf 消息，每個消息約 30 個字節。 在 C++ 中需要大約 0.9 秒，在 python 中需要 35 秒。

讓它更快一點的一種方法是重新實現 varint 解碼器，使其從文件中讀取並一次性解碼，而不是像當前代碼那樣從文件中讀取然后解碼。 （分析顯示在 varint 編碼器/解碼器中花費了大量時間）。 但不用說，僅憑這一點還不足以縮小 python 版本和 C++ 版本之間的差距。

任何讓它更快的想法都非常受歡迎:)

Answer 7

為了完整起見，我在這里發布了一個與 protobuf 和 Python3 的主版本一起使用的最新版本

對於 C++ 版本，使用 delimited_message_utils.h 中的工具就足夠了，這里是一個 MWE

#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/util/delimited_message_util.h>

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

template <typename T>
bool writeManyToFile(std::deque<T> messages, std::string filename) {
    int outfd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC);
    google::protobuf::io::FileOutputStream fout(outfd);

    bool success;
    for (auto msg: messages) {
        success = google::protobuf::util::SerializeDelimitedToZeroCopyStream(
            msg, &fout);
        if (! success) {
            std::cout << "Writing Failed" << std::endl;
            break;
        }
    }
    fout.Close();
    close(outfd);
    return success;
}

template <typename T>
std::deque<T> readManyFromFile(std::string filename) {
    int infd = open(filename.c_str(), O_RDONLY);

    google::protobuf::io::FileInputStream fin(infd);
    bool keep = true;
    bool clean_eof = true;
    std::deque<T> out;

    while (keep) {
        T msg;
        keep = google::protobuf::util::ParseDelimitedFromZeroCopyStream(
            &msg, &fin, nullptr);
        if (keep)
            out.push_back(msg);
    }
    fin.Close();
    close(infd);
    return out;
}

對於 Python3 版本，基於 @fireboot 的回答，唯一需要修改的是 raw_varint32 的解碼

def getSize(raw_varint32):
    result = 0
    shift = 0
    b = six.indexbytes(raw_varint32, 0)
    result |= ((ord(b) & 0x7f) << shift)
    return result

def readDelimitedFrom(MessageType, stream):
    raw_varint32 = readRawVarint32(stream)
    message = None

    if raw_varint32:
        size = getSize(raw_varint32)

        data = stream.read(size)
        if len(data) < size:
            raise Exception("Unexpected end of file")

        message = MessageType()
        message.ParseFromString(data)

    return message

Answer 8

也在為此尋找解決方案。 這是我們解決方案的核心，假設一些 java 代碼將許多帶有writeDelimitedTo MyRecord 消息寫入一個文件。 打開文件並循環，執行：

if(someCodedInputStream->ReadVarint32(&bytes)) {
  CodedInputStream::Limit msgLimit = someCodedInputStream->PushLimit(bytes);
  if(myRecord->ParseFromCodedStream(someCodedInputStream)) {
    //do your stuff with the parsed MyRecord instance
  } else {
    //handle parse error
  }
  someCodedInputStream->PopLimit(msgLimit);
} else {
  //maybe end of file
}

希望能幫助到你。

Answer 9

使用objective-c 版本的protocol-buffers，我遇到了這個確切的問題。 在從 iOS 客戶端發送到使用 parseDelimitedFrom 的基於 Java 的服務器時，它期望長度為第一個字節，我需要先將 writeRawByte 調用到 CodedOutputStream。 在這里發帖希望能幫助遇到這個問題的其他人。 在解決這個問題時，有人會認為 Google proto-bufs 會帶有一個簡單的標志，可以為您執行此操作...

    Request* request = [rBuild build];

    [self sendMessage:request];
} 


- (void) sendMessage:(Request *) request {

    //** get length
    NSData* n = [request data];
    uint8_t len = [n length];

    PBCodedOutputStream* os = [PBCodedOutputStream streamWithOutputStream:outputStream];
    //** prepend it to message, such that Request.parseDelimitedFrom(in) can parse it properly
    [os writeRawByte:len];
    [request writeToCodedOutputStream:os];
    [os flush];
}

Answer 10

由於我不允許將其寫為對上面肯頓·瓦爾達 (Kenton Varda) 回答的評論； 我相信他發布的代碼中存在錯誤（以及提供的其他答案）。 以下代碼：

...
google::protobuf::io::CodedInputStream input(rawInput);

// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;

// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
    input.PushLimit(size);
...

設置不正確的限制，因為它沒有考慮已經從輸入中讀取的 varint32 的大小。 這可能會導致數據丟失/損壞，因為從流中讀取了額外的字節，這可能是下一條消息的一部分。 正確處理此問題的常用方法是刪除用於讀取大小的 CodedInputStream 並創建一個新的用於讀取有效負載：

...
uint32_t size;
{
  google::protobuf::io::CodedInputStream input(rawInput);

  // Read the size.
  if (!input.ReadVarint32(&size)) return false;
}

google::protobuf::io::CodedInputStream input(rawInput);

// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
    input.PushLimit(size);
...

Answer 11

您可以使用 getline 使用指定的分隔符從流中讀取字符串：

istream& getline ( istream& is, string& str, char delim );

（在標題中定義）

Java 中的協議緩沖區分隔 I/O 函數是否有 C++ 等效項？

問題描述

更新：

11 個解決方案

解決方案1
81 已采納 2014-04-08 03:49:06

解決方案2
17 2010-02-26 12:53:21

解決方案3
12 2010-02-26 13:19:01

解決方案4
8

解決方案5
7 2012-10-02 06:06:25

解決方案6
7 2015-12-31 01:13:36

解決方案7
4 2019-11-27 15:31:51

解決方案8
3 2011-06-30 09:43:42

解決方案9
0 2013-12-13 21:24:14

解決方案10
0 2016-04-05 01:59:28

解決方案11
-7 2010-02-26 10:20:05

Java 中的協議緩沖區分隔 I/O 函數是否有 C++ 等效項？

問題描述

更新：

11 個解決方案

解決方案1 81 已采納 2014-04-08 03:49:06

解決方案2 17 2010-02-26 12:53:21

解決方案3 12 2010-02-26 13:19:01

解決方案4 8

解決方案5 7 2012-10-02 06:06:25

解決方案6 7 2015-12-31 01:13:36

解決方案7 4 2019-11-27 15:31:51

解決方案8 3 2011-06-30 09:43:42

解決方案9 0 2013-12-13 21:24:14

解決方案10 0 2016-04-05 01:59:28

解決方案11 -7 2010-02-26 10:20:05

解決方案1
81 已采納 2014-04-08 03:49:06

解決方案2
17 2010-02-26 12:53:21

解決方案3
12 2010-02-26 13:19:01

解決方案4
8

解決方案5
7 2012-10-02 06:06:25

解決方案6
7 2015-12-31 01:13:36

解決方案7
4 2019-11-27 15:31:51

解決方案8
3 2011-06-30 09:43:42

解決方案9
0 2013-12-13 21:24:14

解決方案10
0 2016-04-05 01:59:28

解決方案11
-7 2010-02-26 10:20:05