简体   繁体   English

stl 向量中的内存映射文件存储

[英]Memory mapped file storage in stl vector

I'm trying to implement custom allocator for storing memory mapped files in the std::vector .我正在尝试实现自定义allocator ,用于在std::vector存储内存映射文件。 Files mapping performed by boost::iostreams::mapped_fileboost::iostreams::mapped_file执行的文件映射

Allocator type for file memory mapping:文件内存映射的分配器类型:

template<typename T>
class mmap_allocator 
{
public:
  typedef T value_type;

  mmap_allocator(const std::string& filename) 
  : _mmfile(filename) {  } 

  T* allocate (size_t n) 
  { 
     return reinterpret_cast<T*>(_mmfile.data());
  }
  void deallocate (T* p, size_t n) 
  { 
     p = nullptr;
     _mmfile.close();
  }

private:
  boost::iostreams::mapped_file _mmfile;
};

Container for memory mapped file, based on std::vector :内存映射文件的容器,基于std::vector

//Get file size
long GetFileSize(std::string filename)
{
    FILE *p_file = NULL;
    p_file = fopen(filename.c_str(),"rb");
    fseek(p_file,0,SEEK_END);
    int size = ftell(p_file);
    fclose(p_file);
    return size;
}

template<typename T>
class mm_vector : public std::vector<T, mmap_allocator<T> >
{
public:
  typedef mmap_allocator<T> allocator_type;
  typedef std::vector<T, allocator_type > b_vector;

  mm_vector(const std::string filename) : b_vector(GetFileSize(filename)/sizeof(T), allocator_type(filename)) 
  {  
    b_vector::reserve(GetFileSize(filename)/sizeof(T));
  }
};

Test code:测试代码:

int main()
{
  mm_vector<int> v("test.f");//test.f - binary file contain several integers
  for(auto x : v) std::cout<<x<<"  ";
}

This code don't work properly - output always equals to zero.此代码无法正常工作 - 输出始终为零。 File contains correct content - several integers.文件包含正确的内容 - 几个整数。 This code works well:这段代码运行良好:

boost::iostreams::mapped_file _mmfile("test.f");
int* p = (int*)(_mmfile.data());
std::cout<<p[0];

What am I doing wrong?我究竟做错了什么?

The problem is zero initialization , calling the constructor that receive the size and the allocator would initialize the vector elements to the default value of the element type (in this case 0).问题是零初始化,调用接收大小的构造函数,分配器会将向量元素初始化为元素类型的默认值(在本例中为 0)。 This is mandated by the standard.这是标准规定的。

23.3.7.2 vector constructors, copy, and assignment [vector.cons] § 23.3.7.2 789 23.3.7.2 向量构造函数、复制和赋值 [vector.cons] § 23.3.7.2 789

explicit vector(size_type n, const Allocator& = Allocator());

-Effects: Constructs a vector with n default-inserted elements using the specified allocator. -Effects:使用指定的分配器构造一个带有 n 个默认插入元素的向量。
-Requires: T shall be DefaultInsertable into *this. - 要求:T 应该是 DefaultInsertable 到 *this。
-Complexity: Linear in n. - 复杂性:n 中的线性。

In my case the used file was filled with 0 too.在我的情况下,使用的文件也填充了 0。 Tested in GCC 4.9.0.在 GCC 4.9.0 中测试。 Has logic because the default mapmode of mapped_file is readwrite .有逻辑,因为 mapping_file 的默认 mapmode 是readwrite

In the sample code i added print of the mapped memory content when the allocation is happen (in the custom allocator), in the construction of the vector and the existed print in main.在示例代码中,我在分配发生时(在自定义分配器中)添加了映射内存内容的打印,在向量构造和 main 中的现有打印中。 The first print output the correct data of the file and the second output the zeroed version.第一次打印输出文件的正确数据,第二次输出归零版本。

#include <vector>
#include <iostream>
#include <chrono>
#include <iomanip>
#include <boost/iostreams/device/mapped_file.hpp>

template <typename T>
class mmap_allocator {
public:
    typedef T value_type;

    mmap_allocator(const std::string& filename) : _mmfile(filename) {}

    T* allocate(size_t n) {
        std::cout << "OUTPUT 1:" << std::endl;
        auto v = reinterpret_cast<T*>(_mmfile.data());
        for (unsigned long idx = 0; idx < _mmfile.size()/sizeof(int); idx++)
            std::cout << v[idx] << " ";
        return reinterpret_cast<T*>(_mmfile.data());
    }
    void deallocate(T* p, size_t n) {
        p = nullptr;
        _mmfile.close();
    }

private:
    boost::iostreams::mapped_file _mmfile;
};

// Get file size
long GetFileSize(std::string filename) {
    FILE* p_file = NULL;
    p_file = fopen(filename.c_str(), "rb");
    fseek(p_file, 0, SEEK_END);
    int size = ftell(p_file);
    fclose(p_file);
    return size;
}

template <typename T>
class mm_vector : public std::vector<T, mmap_allocator<T>> {
public:
    typedef mmap_allocator<T> allocator_type;
    typedef std::vector<T, allocator_type> b_vector;

    mm_vector(const std::string filename)
        : b_vector(GetFileSize(filename) / sizeof(T),
                   allocator_type(filename)) {
        std::cout << std::endl << std::endl << "OUTPUT 2:" << std::endl;
        for (auto x : *this)
            std::cout << x << "  ";
        b_vector::reserve(GetFileSize(filename) / sizeof(T));
    }
};

int main(int argc, char* argv[]) {
    std::chrono::system_clock::time_point begin_time =
        std::chrono::system_clock::now();

    mm_vector<int> v("H:\\save.txt");
    std::cout << std::endl << std::endl << "OUTPUT 2:" << std::endl;
    for (auto x : v)
        std::cout << x << "  ";

    std::chrono::system_clock::time_point end_time =
        std::chrono::system_clock::now();
    long long elapsed_miliseconds =
        std::chrono::duration_cast<std::chrono::milliseconds>(
            end_time - begin_time).count();
    std::cout << "Duration (min:seg:mili): " << std::setw(2)
              << std::setfill('0') << (elapsed_miliseconds / 60000) << ":"
              << std::setw(2) << std::setfill('0')
              << ((elapsed_miliseconds / 1000) % 60) << ":" << std::setw(2)
              << std::setfill('0') << (elapsed_miliseconds % 1000) << std::endl;
    std::cout << "Total milliseconds: " << elapsed_miliseconds << std::endl;

    return 0;
}

You might want to give你可能想给

https://github.com/johannesthoma/mmap_allocator https://github.com/johannesthoma/mmap_allocator

a try.一试。 It uses contents of an mmap'ed file as backing storage for a vector and is LGPL so you should be able to use it in your projects.它使用 mmap 文件的内容作为矢量的后备存储,并且是 LGPL,因此您应该能够在您的项目中使用它。 Note that currently, gcc is a requirement but it can be easily extended.请注意,目前,gcc 是一项要求,但可以轻松扩展。

To make the advice from NetVipeC's answer explicit (with help from the mmap_allocator library suggested by Johannes Thoma), if you're using the GNU Standard C++ Library the following replacement for your mm_vector class prevents the contents of your memory-mapped vector from being initialized to zero (and eliminates the need for the GetFileSize function):为了使 NetVipeC 的答案中的建议明确(在 Johannes Thoma 建议的 mmap_allocator 库的帮助下),如果您使用的是 GNU 标准 C++ 库,则 mm_vector 类的以下替换可防止您的内存映射向量的内容被初始化为零(并消除对GetFileSize函数的需要):

template <typename T>
class mm_vector : public std::vector<T, mmap_allocator<T>> {
public:
    typedef mmap_allocator<T> allocator_type;
    typedef std::vector<T, allocator_type> b_vector;

    mm_vector(const std::string filename)
        : b_vector(allocator_type(filename)) {

        allocator_type * a = &b_vector::_M_get_Tp_allocator();
        size_t n = a->size() / sizeof(T);
        b_vector::reserve(n);
        // _M_set_finish(n);
        this->_M_impl._M_finish = this->_M_impl._M_end_of_storage = this->_M_impl._M_start + n;
    }
};

We prevent the contents of the vector from being zeroed by allowing it to be initialized with the default size of 0, and then fiddle with its internals afterward to adjust the size.我们通过允许向量以默认大小 0 进行初始化来防止向量的内容被清零,然后调整其内部结构以调整大小。 It's unlikely that this is a complete solution;这不太可能是一个完整的解决方案。 I haven't checked whether operations that change the size of the vector work properly for example.例如,我没有检查改变向量大小的操作是否正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM