简体   繁体   English

c ++读取在python中创建的二进制文件中的映射

[英]c++ read map in binary file which created in python

I created a Python script which creates the following map (illustration):我创建了一个 Python 脚本,它创建了以下地图(插图):

map<uint32_t, string> tempMap = {{2,"xx"}, {200, "yy"}};

and saved it as map.out file (a binary file).并将其保存为 map.out 文件(二进制文件)。 When I try to read the binary file from C++, it doesn't copy the map, why?当我尝试从 C++ 读取二进制文件时,它没有复制映射,为什么?

    map<uint32_t, string> tempMap;
    ifstream readFile;
    std::streamsize length;

    readFile.open("somePath\\map.out", ios::binary | ios::in);
    if (readFile)
    {
        readFile.ignore( std::numeric_limits<std::streamsize>::max() );
        length = readFile.gcount();
        readFile.clear();   //  Since ignore will have set eof.
        readFile.seekg( 0, std::ios_base::beg );
        readFile.read((char *)&tempMap,length);
        for(auto &it: tempMap)
        {
          /* cout<<("%u, %s",it.first, it.second.c_str()); ->Prints map*/
        }
    }
    readFile.close();
    readFile.clear();

It's not possible to read in raw bytes and have it construct a map (or most containers, for that matter) [1] ;不可能读取原始字节并让它构造一个map (或大多数容器,就此而言) [1] and so you will have to write some code to perform proper serialization instead.因此您将不得不编写一些代码来执行正确的序列化

If the data being stored/loaded is simple, as per your example, then you can easily devise a scheme for how this might be serialized, and then write the code to load it.如果存储/加载的数据很简单,根据您的示例,那么您可以轻松设计一个如何序列化的方案,然后编写代码来加载它。 For example, a simple plaintext mapping can be established by writing the file with each member after a newline:例如,可以通过在换行符后写入每个成员的文件来建立简单的明文映射:

<number>
<string>
...

So for your example of:所以对于你的例子:

std::map<std::uint32_t, std::string> tempMap = {{2,"xx"}, {200, "yy"}};

this could be encoded as:这可以编码为:

2
xx
200
yy

In which case the code to deserialize this would simply read each value 1-by-1 and reconstruct the map :在这种情况下,反序列化的代码将简单地 1×1 读取每个值并重建map

// Note: untested
auto loadMap(const std::filesystem::path& path) -> std::map<std::uint32_t, std::string>
{
  auto result = std::map<std::uint32_t, std::string>{};
  auto file = std::ifstream{path};
  
  while (true) {
    auto key = std::uint32_t{};
    auto value = std::string{};
    
    if (!(file >> key)) { break; }
    if (!std::getline(file, value)) { break; }

    result[key] = std::move(value);
  }
  return result;
}

Note: For this to work, you need your python program to output the format that will be read from your C++ program.注意:为此,您需要您的 Python 程序输出将从您的 C++ 程序中读取的格式。

If the data you are trying to read/write is sufficiently complicated, you may look into different serialization interchange formats.如果您尝试读取/写入的数据足够复杂,您可以查看不同的序列化交换格式。 Since you're working between python and C++, you'll need to look into libraries that support both.由于您在 python 和 C++ 之间工作,因此您需要查看支持两者的库。 For a list of recommendations, see the answers to Cross-platform and language (de)serialization有关建议列表,请参阅跨平台和语言(反)序列化的答案


[1] The reason you can't just read (or write) the whole container as bytes and have it work is because data in containers isn't stored inline. [1]不能将整个容器作为字节读取(或写入)并让它工作的原因是因为容器中的数据不是内联存储的。 Writing the raw bytes out won't produce something like 2 xx\\n200 yy\\n automatically for you.写出原始字节不会自动为您生成2 xx\\n200 yy\\n的东西。 Instead, you'll be writing the raw addresses of pointers to indirect data structures such as the map's internal node objects.相反,您将写入指向间接数据结构(例如映射的内部节点对象)的指针的原始地址。

For example, a hypothetical map implementation might contain a node like:例如,一个假设的map实现可能包含一个节点,如:

template <typename Key, typename Value>
struct map_node
{
  Key key;
  Value value;
  map_node* left;
  map_node* right;
};

(The real map implementation is much more complicated than this, but this is a simplified representation) (真正的map实现比这个复杂很多,但这是一个简化的表示)

If map<Key,Value> contains a map_node<Key,Value> member, then writing this out in binary will write the binary representation of key , value , left , and right -- the latter of which are pointers.如果map<Key,Value>包含一个map_node<Key,Value>成员,那么用二进制写出它会写出keyvalueleftright的二进制表示——后者是指针。 The same is true with any container that uses indirection of any kind;任何使用任何类型间接的容器也是如此; the addresses will fundamentally differ between the time they are written and read, since they depend on the state of the program at any given time.地址在它们被写入和读取的时间之间会有根本的不同,因为它们取决于程序在任何给定时间的状态。

You can write a simple map_node to test this, and just print out the bytes to see what it produces;您可以编写一个简单的map_node来测试它,只需打印出字节即可查看它产生的结果; the pointer will be serialized as well.指针也将被序列化。 Behaviorally, this is the exact same as what you are trying to do with a map and reading from a binary file.从行为上讲,这与您尝试使用map和从二进制文件中读取的操作完全相同。 See the below example which includes different addresses.请参阅以下示例,其中包含不同的地址。

Live Example现场示例

You can use protocol buffers to serialize your map in python and deserialization can be performed in C++.您可以使用协议缓冲区在 python 中序列化您的地图,反序列化可以在 C++ 中执行。

Protocol buffers supports both Python and C++.协议缓冲区支持 Python 和 C++。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM