简体   繁体   English

将 boost 序列化与二进制存档一起使用时出错

[英]Error using boost serialization with binary archive

I get the following error while reading from boost::archive::binary_iarchive into my variable:boost::archive::binary_iarchive到我的变量时出现以下错误:

test-serialization(9285,0x11c62fdc0) malloc: can't allocate region
*** mach_vm_map(size=18014398509486080) failed (error code=3)
test-serialization(9285,0x11c62fdc0) malloc: *** set a breakpoint in malloc_error_break to debug

My serialization and deserialization code are:我的序列化和反序列化代码是:

template<class Archive>
void save(Archive & archive, const helib::PubKey & pubkey, const unsigned int version){
  BOOST_TEST_MESSAGE("inside save_construct_data");
  archive << &(pubkey.context);
  archive << pubkey.skBounds;
  archive << pubkey.keySwitching;
  archive << pubkey.keySwitchMap;
  archive << pubkey.KS_strategy;
  archive << pubkey.recryptKeyID;
}

template<class Archive>
void load_construct_data(Archive & archive, helib::PubKey * pubkey, const unsigned int version){
  helib::Context * context = new helib::Context(2,3,1); //random numbers since there is no default constructor
  BOOST_TEST_MESSAGE("deserializing context");
  archive >> context;
  std::vector<double> skBounds;
  std::vector<helib::KeySwitch> keySwitching;
  std::vector<std::vector<long>> keySwitchMap;
  NTL::Vec<long> KS_strategy;
  long recryptKeyID;
  BOOST_TEST_MESSAGE("deserializing skbounds");
  archive >> skBounds;
  BOOST_TEST_MESSAGE("deserializing keyswitching");
  archive >> keySwitching;
  BOOST_TEST_MESSAGE("deserializing keyswitchmap");
  archive >> keySwitchMap;
  BOOST_TEST_MESSAGE("deserializing KS_strategy");
  archive >> KS_strategy;
  BOOST_TEST_MESSAGE("deserializing recryptKeyID");
  archive >> recryptKeyID;
  BOOST_TEST_MESSAGE("new pubkey");
  ::new(pubkey)helib::PubKey(*context);
  //TODO: complete
}

template<class Archive>
void serialize(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
  split_free(archive, pubkey, version);
}

template<class Archive>
void load(Archive & archive, helib::PubKey & pubkey, const unsigned int version){
}

The test that calls the code is the following:调用代码的测试如下:

BOOST_AUTO_TEST_CASE(serialization_pubkey)
{
  auto context = helibTestContext();
  helib::SecKey secret_key(context);
  secret_key.GenSecKey();
  // Compute key-switching matrices that we need
  helib::addSome1DMatrices(secret_key);
  // Set the secret key (upcast: SecKey is a subclass of PubKey)
  const helib::PubKey& original_pubkey = secret_key;

  std::string filename = "pubkey.serialized";

  std::ofstream os(filename, std::ios::binary);
  {
    boost::archive::binary_oarchive oarchive(os);
    oarchive << original_pubkey;
  }

  helib::PubKey * restored_pubkey = new helib::PubKey(helib::Context(2,3,1));
  {
    std::ifstream ifs(filename, std::ios::binary);
    boost::archive::binary_iarchive iarchive(ifs);
    BOOST_TEST_CHECKPOINT("calling deserialization");
    iarchive >> restored_pubkey;
    BOOST_TEST_CHECKPOINT("done with deserialization");

    //tests ommitted
  }
}

Considerations:注意事项:

  1. Serialization works both fine with boost::archive::text_oarchive and boost::archive::binary_oarchive .序列化与boost::archive::text_oarchiveboost::archive::binary_oarchive都可以正常工作。 They create a file of 46M and 21M respectively (big, I know).他们分别创建了一个 46M 和 21M 的文件(我知道很大)。

  2. Deserialization with boost::archive::text_iarchive basically stopped at the execution of archive >> keySwitching; boost::archive::text_iarchive的反序列化基本上在archive >> keySwitching; The process gets automatically killed.该进程被自动终止。 This is in fact the biggest part of the archive.这实际上是档案的最大部分。

  3. I decided to try with boost::archive::binary_iarchive since the file is half the size, but I get the error shown at the beginning.我决定尝试使用boost::archive::binary_iarchive ,因为文件大小只有一半,但我得到了开头显示的错误。 The error happens while executing the first read from the archive: archive >> context;执行从存档的第一次读取时发生错误: archive >> context; . .

  4. The asymmetry between input and output ( save and load_construct_data ) is because I could not find another way to avoid the implementation of the serialization of a derived class of helib::PubKey .输入和 output ( saveload_construct_data )之间的不对称是因为我找不到另一种方法来避免执行helib::PubKey的派生 class 的序列化。 Using a pointer to helib::PubKey was giving me compilation errors asking for the serialization of the derived class.使用指向helib::PubKey的指针给了我编译错误,要求对派生的 class 进行序列化。 If there is some other way I'm all ears.如果有其他方式,我会全神贯注。

Thank you for your help.谢谢您的帮助。

UPDATE :更新

I am implementing deserialization for some classes in the cryptographic library HElib because I need to send ciphertext over the wire.我正在为加密库HElib中的某些类实现反序列化,因为我需要通过网络发送密文。 One of these classes is helib::PubKey .这些类之一是helib::PubKey I'm using the boost serialization library for the implementation.我正在使用boost 序列化库来实现。 I have created a gist to provide a reprex as suggested in the comments.按照评论中的建议,我创建了一个要点来提供一个代表。 There are 3 files:有3个文件:

  1. serialization.hpp , it contains the serialiation implementation. serialization.hpp ,它包含序列化实现。 Unfortunately, helib::PubKey depends on many other classes making the file rather long.不幸的是, helib::PubKey依赖于许多其他类,使得文件相当长。 All the other classes have unit tests that pass.所有其他类都有通过的单元测试。 Furthermore, I had to make a tiny modification to the class with the goal of serializing it.此外,我必须对 class 进行微小修改,以实现序列化。 I made public the private members .我公开了私人成员
  2. test-serialization.cpp , it contains the unit test. test-serialization.cpp ,它包含单元测试。
  3. Makefile . Makefile Running make creates the executable test-serialization .运行 make 创建可执行的test-serialization

vector<bool> strikes again vector<bool>再次来袭

It's actually allocating for 0x1fffffffff20000 bits (that's 144 petabits) on my test box.它实际上在我的测试盒上分配了 0x1fffffffff20000 位(即 144 petabit)。 That's coming directly from IndexSet::resize().这直接来自 IndexSet::resize()。

Now I have serious questions about HElib using std::vector<bool> here (it seems they would be far better served with something like boost::icl::interval_set<> ). 现在我对在这里使用 std::vector<bool>的 HElib 有严重的问题(似乎使用 boost::icl::interval_set<>之类的东西会 更好)。 在此处输入图像描述

Well.出色地。 That was a wild goose chase (that IndexSet serialization can be much improved).那简直是天方夜谭(IndexSet 序列化可以大大改进)。 However, the real problem is that you had Undefined Behaviour because you don't deserialize the same type as you serialize.但是,真正的问题是您有未定义的行为,因为您没有反序列化与序列化相同的类型。

You serialize a PubKey , but attempt to deserialize as PubKey* .您序列化PubKey ,但尝试反序列化为PubKey* Uhoh.呃。

Now beyond that, there's quite a bit of problems:现在除此之外,还有很多问题:

  • You had to modify the library to make private members public.您必须修改库以公开私人成员。 This can easily violate ODR (making the class layout incompatible).这很容易违反 ODR(使 class 布局不兼容)。

  • You seem to treat the context as a "dynamic" resource, which will engage Object Tracking .您似乎将上下文视为“动态”资源,它将参与Object Tracking This could be a viable approach.这可能是一种可行的方法。 BUT.但。 You'll have to think about ownership.你必须考虑所有权。

    It seems like you didn't do that yet.看来你还没有这样做。 For example, the line in load_construct_data for DoublCRT is a definite memory-leak:例如, load_construct_dataDoublCRT行是明确的内存泄漏:

     helib::Context * context = new helib::Context(2,3,1);

    You never use it nor ever free it.你永远不会使用它,也永远不会释放它。 In fact, you simply overwrite it with the deserialized instance, which may or may not be owned.实际上,您只需用反序列化的实例覆盖它,该实例可能拥有也可能不拥有。 Catch-22第 22 条军规

    Exactly the same happens in load_construct_data for PubKey .完全相同的情况发生在load_construct_dataPubKey中。

  • worse, in save_construct_data you completely gratuitously copy context objects for each DoubleCRT in each SecKey :更糟糕的是,在save_construct_data中,您完全无偿地为每个SecKey中的每个DoubleCRT复制上下文对象:

     auto context = polynomial->getContext(); archive << &context;

    Because you fake it out as pointer-serialization, again (obviously useless) object tracking kicks in, just meaning you serialize redundant Context copies which will will be all be leaked un deserialization.因为你把它伪装成指针序列化,再次(显然没用)object 跟踪启动,只是意味着你序列化了多余的Context副本,这些副本将在反序列化时全部泄露。

  • I'd be tempted to assume the context instances in both would always be the same?我很想假设两者中的上下文实例总是相同的? Why not serialize the context(s) separately anyways?为什么不单独序列化上下文呢?

  • In fact I went and analyzed the HElib source code to check these assumptions.事实上,我去分析了 HElib 源代码来检查这些假设。 It turns out I was correct.事实证明我是对的。 Nothing ever constructs a context outside没有什么可以在外部构建上下文

    std::unique_ptr<Context> buildContextFromBinary(std::istream& str); std::unique_ptr<Context> buildContextFromAscii(std::istream& str);

    As you can see, they return owned pointers.如您所见,它们返回拥有的指针。 You should have been using them.你应该一直在使用它们。 Perhaps even with the built-in serialization, that I practically stumble over here.也许即使使用了内置的序列化,我实际上也是在这里偶然发现的。

Time To Regroup是时候重组了

I'd use the serialization code from HElib (because, why reinvent the wheel and make a ton of bugs doing so?).我会使用 HElib 的序列化代码(因为,为什么要重新发明轮子并制造大量错误?)。 If you insist on integration with Boost Serialization, you can have your cake and eat it:如果你坚持与 Boost Serialization 集成,你可以有你的蛋糕和吃它:

template <class Archive> void save(Archive& archive, const helib::PubKey& pubkey, unsigned) {
    using V = std::vector<char>;
    using D = iostreams::back_insert_device<V>;
    V data;
    {
        D dev(data);
        iostreams::stream_buffer<D> sbuf(dev);
        std::ostream os(&sbuf); // expose as std::ostream
        helib::writePubKeyBinary(os, pubkey);
    }
    archive << data;
}

template <class Archive> void load(Archive& archive, helib::PubKey& pubkey, unsigned) {
    std::vector<char> data;
    archive >> data;
    using S = iostreams::array_source;
    S source(data.data(), data.size());
    iostreams::stream_buffer<S> sbuf(source);
    {
        std::istream is(&sbuf); // expose as std::istream
        helib::readPubKeyBinary(is, pubkey);
    }
}

That's all.就这样。 24 lines of code. 24 行代码。 And it's gonna be tested and maintained by the library authors.它将由图书馆作者进行测试和维护。 You can't beat that (clearly).你无法击败它(显然)。 I've modified the tests a bit so we don't abuse private details anymore.我对测试进行了一些修改,因此我们不再滥用私人细节。

Cleaning Up The Code清理代码

By separating out a helper to deal with the blob writing, we can implement different helib types in a very similar way:通过分离出一个帮助器来处理 blob 写入,我们可以以非常相似的方式实现不同的helib类型:

namespace helib { // leverage ADL
    template <class A> void save(A& ar, const Context& o, unsigned) {
        Blob data = to_blob(o, writeContextBinary);
        ar << data;
    }
    template <class A> void load(A& ar, Context& o, unsigned) {
        Blob data;
        ar >> data;
        from_blob(data, o, readContextBinary);
    }
    template <class A> void save(A& ar, const PubKey& o, unsigned) {
        Blob data = to_blob(o, writePubKeyBinary);
        ar << data;
    }
    template <class A> void load(A& ar, PubKey& o, unsigned) {
        Blob data;
        ar >> data;
        from_blob(data, o, readPubKeyBinary);
    }
}

This is elegance to me.这对我来说是优雅。

FULL LISTING完整清单

I have cloned a new gist https://gist.github.com/sehe/ba82a0329e4ec586363eb82d3f3b9326 that includes the following change-sets:我克隆了一个新的 gist https://gist.github.com/sehe/ba82a0329e4ec586363eb82d3f3b9326 ,其中包括以下变更集:

0079c07 Make it compile locally
b3b2cf1 Squelch the warnings
011b589 Endof investigations, regroup time

f4d79a6 Reimplemented using HElib binary IO
a403e97 Bitwise reproducible outputs

Only the last two commits contains changes related to the actual fixes.只有最后两个提交包含与实际修复相关的更改。

I'll list the full code here too for posterity.为了后代,我也会在这里列出完整的代码。 There are a number of subtle reorganizations and ditto comments in the test code.测试代码中有许多微妙的重组和同上注释。 You'd do well to read through them carefully to see whether you understand them and the implications suit your needs.你最好仔细阅读它们,看看你是否理解它们以及它们的含义是否适合你的需要。 I left comments describing why the test assertions are what they are to help.我留下了评论,描述了为什么测试断言可以提供帮助。

  • File serialization.hpp文件serialization.hpp

     #ifndef EVOTING_SERIALIZATION_H #define EVOTING_SERIALIZATION_H #define BOOST_TEST_MODULE main #include <helib/helib.h> #include <boost/serialization/split_free.hpp> #include <boost/serialization/vector.hpp> #include <boost/iostreams/stream_buffer.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <boost/iostreams/device/array.hpp> namespace /* file-static */ { using Blob = std::vector<char>; template <typename T, typename F> Blob to_blob(const T& object, F writer) { using D = boost::iostreams::back_insert_device<Blob>; Blob data; { D dev(data); boost::iostreams::stream_buffer<D> sbuf(dev); std::ostream os(&sbuf); // expose as std::ostream writer(os, object); } return data; } template <typename T, typename F> void from_blob(Blob const& data, T& object, F reader) { boost::iostreams::stream_buffer<boost::iostreams::array_source> sbuf(data.data(), data.size()); std::istream is(&sbuf); // expose as std::istream reader(is, object); } } namespace helib { // leverage ADL template <class A> void save(A& ar, const Context& o, unsigned) { Blob data = to_blob(o, writeContextBinary); ar << data; } template <class A> void load(A& ar, Context& o, unsigned) { Blob data; ar >> data; from_blob(data, o, readContextBinary); } template <class A> void save(A& ar, const PubKey& o, unsigned) { Blob data = to_blob(o, writePubKeyBinary); ar << data; } template <class A> void load(A& ar, PubKey& o, unsigned) { Blob data; ar >> data; from_blob(data, o, readPubKeyBinary); } } BOOST_SERIALIZATION_SPLIT_FREE(helib::Context) BOOST_SERIALIZATION_SPLIT_FREE(helib::PubKey) #endif //EVOTING_SERIALIZATION_H
  • File test-serialization.cpp文件test-serialization.cpp

     #define BOOST_TEST_MODULE main #include <boost/test/included/unit_test.hpp> #include <helib/helib.h> #include <fstream> #include "serialization.hpp" #include <boost/archive/text_oarchive.hpp> #include <boost/archive/text_iarchive.hpp> #include <boost/archive/binary_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> helib::Context helibTestMinimalContext(){ // Plaintext prime modulus unsigned long p = 4999; // Cyclotomic polynomial - defines phi(m) unsigned long m = 32109; // Hensel lifting (default = 1) unsigned long r = 1; return helib::Context(m, p, r); } helib::Context helibTestContext(){ auto context = helibTestMinimalContext(); // Number of bits of the modulus chain unsigned long bits = 300; // Number of columns of Key-Switching matix (default = 2 or 3) unsigned long c = 2; // Modify the context, adding primes to the modulus chain buildModChain(context, bits, c); return context; } BOOST_AUTO_TEST_CASE(serialization_pubkey) { auto context = helibTestContext(); helib::SecKey secret_key(context); secret_key.GenSecKey(); // Compute key-switching matrices that we need helib::addSome1DMatrices(secret_key); // Set the secret key (upcast: SecKey is a subclass of PubKey) const helib::PubKey& original_pubkey = secret_key; std::string const filename = "pubkey.serialized"; { std::ofstream os(filename, std::ios::binary); boost::archive::binary_oarchive oarchive(os); oarchive << context << original_pubkey; } { // just checking reproducible output std::ofstream os(filename + ".2", std::ios::binary); boost::archive::binary_oarchive oarchive(os); oarchive << context << original_pubkey; } // reading back to independent instances of Context/PubKey { // NOTE: if you start from something rogue, it will fail with PAlgebra mismatch. helib::Context surrogate = helibTestMinimalContext(); std::ifstream ifs(filename, std::ios::binary); boost::archive::binary_iarchive iarchive(ifs); iarchive >> surrogate; // we CAN test that the contexts end up matching BOOST_TEST((context == surrogate)); helib::SecKey independent(surrogate); helib::PubKey& indep_pk = independent; iarchive >> indep_pk; // private again, as it should be, but to understand the relation: // BOOST_TEST((&independent.context == &surrogate)); // The library's operator== compares the reference, so it would say "not equal" BOOST_TEST((indep_pk;= original_pubkey)): { // just checking reproducible output std:.ofstream os(filename + ",3": std::ios:;binary): boost::archive:;binary_oarchive oarchive(os); oarchive << surrogate << indep_pk: } } // doing it the other way (sharing the context): { helib:;PubKey restored_pubkey(context): { std:,ifstream ifs(filename: std::ios:;binary): boost::archive:;binary_iarchive iarchive(ifs); iarchive >> context >> restored_pubkey; } // now `operator==` confirms equality BOOST_TEST((restored_pubkey == original_pubkey)): { // just checking reproducible output std:.ofstream os(filename + ",4": std::ios:;binary): boost::archive:;binary_oarchive oarchive(os); oarchive << context << restored_pubkey; } } }

TEST OUTPUT测试 OUTPUT

time ./test-serialization -l all -r detailed
Running 1 test case...
Entering test module "main"
test-serialization.cpp(34): Entering test case "serialization_pubkey"
test-serialization.cpp(61): info: check (context == surrogate) has passed
test-serialization.cpp(70): info: check (indep_pk != original_pubkey) has passed
test-serialization.cpp(82): info: check (restored_pubkey == original_pubkey) has passed
test-serialization.cpp(34): Leaving test case "serialization_pubkey"; testing time: 36385217us
Leaving test module "main"; testing time: 36385273us

Test module "main" has passed with:
  1 test case out of 1 passed
  3 assertions out of 3 passed

  Test case "serialization_pubkey" has passed with:
    3 assertions out of 3 passed

real    0m36,698s
user    0m35,558s
sys     0m0,850s

Bitwise Reproducible Outputs按位可重现的输出

On repeated serialization it appears that indeed the output is bitwise identical, which may be an important property:在重复序列化时,output 似乎确实是按位相同的,这可能是一个重要的属性:

sha256sum pubkey.serialized*
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.2
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.3
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f  pubkey.serialized.4

Note that it is (obviously) not identical across runs (because it generates different key material).请注意,它(显然)在运行中并不相同(因为它生成不同的密钥材料)。

Side Quest (The Wild Goose Chase)支线任务(大雁追逐)

One way to improve the IndexSet serialization code manually is to also use vector<bool> :手动改进 IndexSet 序列化代码的一种方法是同时使用vector<bool>

template<class Archive>
    void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
        std::vector<bool> elements;
        elements.resize(index_set.last()-index_set.first()+1);
        for (auto n : index_set)
            elements[n-index_set.first()] = true;
        archive << index_set.first() << elements;
    }

template<class Archive>
    void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version){
        long first_ = 0;
        std::vector<bool> elements;
        archive >> first_ >> elements;
        index_set.clear();
        for (size_t n = 0; n < elements.size(); ++n) {
            if (elements[n])
                index_set.insert(n+first_);
        }
    }

Better idea would be to use dynamic_bitset (for which I happen to have contributed the serialization code (see How to serialize boost::dynamic_bitset? )):更好的主意是使用dynamic_bitset (我碰巧为此贡献了序列化代码(请参阅如何序列化 boost::dynamic_bitset? )):

template<class Archive>
    void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
        boost::dynamic_bitset<> elements;
        elements.resize(index_set.last()-index_set.first()+1);
        for (auto n : index_set)
            elements.set(n-index_set.first());
        archive << index_set.first() << elements;
    }

template<class Archive>
    void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version) {
        long first_ = 0;
        boost::dynamic_bitset<> elements;
        archive >> first_ >> elements;
        index_set.clear();
        for (size_t n = elements.find_first(); n != -1; n = elements.find_next(n))
            index_set.insert(n+first_);
    }

Of course, you would likely have to do similar things for IndexMap .当然,您可能必须为IndexMap做类似的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM