简体   繁体   中英

Efficient message field setting in Python Protobuf

I am using Protobuf (v3.5.1) in a Python project I'm working on. My situation can be simplified to the following:

// Proto file

syntax = "proto3";

message Foo {
    Bar bar = 1;
}

message Bar {
    bytes lotta_bytes_here = 1;
}

# Python excerpt
def MakeFooUsingBar(bar):
    foo = Foo()
    foo.bar.CopyFrom(bar)

I am worried about the memory performance of .CopyFrom() (If I am correct, it is copying contents, instead of the reference). Now, in C++, I could use something like:

Foo foo;
Bar* bar = new Bar();
bar->set_lotta_bytes_here("abcd");
foo.set_allocated_bar(bar);

Which looks like it does not need to copy anything judging by the generated source:

inline void Foo::set_allocated_bar(::Bar* bar) {
  ::google::protobuf::Arena* message_arena = GetArenaNoVirtual();
  if (message_arena == NULL) {
    delete bar_;
  }
  if (bar) {
    ::google::protobuf::Arena* submessage_arena = NULL;
    if (message_arena != submessage_arena) {
      bar = ::google::protobuf::internal::GetOwnedMessage(
          message_arena, bar, submessage_arena);
    }

  } else {

  }
  bar_ = bar;
  // @@protoc_insertion_point(field_set_allocated:Foo.bar)
}

Is there something similar available in Python? I have looked through the Python generated sources, but found nothing applicable.

When it comes to large string or bytes objects, it seems that Protobuf figures the situation fairly well. The following passes, which means that while a new Bar object is created, the binary array is copied by reference (Python bytes are immutable, so it makes sense):

def test_copy_from_with_large_bytes_field(self):
    bar = Bar()
    bar.val = b'12345'
    foo = Foo()
    foo.bar.CopyFrom(bar)

    self.assertIsNot(bar, foo.bar)
    self.assertIs(bar.val, foo.bar.val)

This solves my issue of large bytes object. However, if someone's problem lies in nested, or repeated fields, this will not help - such fields are copied field by field. It does make sense - if one copies a message, they want the two to be independent. If they were not, making changes to the original message would modify the copied (and vice versa).

If there is anything akin to the C++ move semantics ( https://github.com/google/protobuf/issues/2791 ) or set_allocated_...() in Python protobuf, that would solve it, however I am not aware of such a feature.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM