简体   繁体   English

在 C++ 中分配动态数组的其他成员

[英]Allocating additional members of a dynamic array in c++

I am trying to find out if I can allocate additional memory for additional members for a dynamic array in C++.我试图找出是否可以为 C++ 中的动态数组的其他成员分配额外的内存。 The code below is stripped down to just essential stuff for simplicity's sake.为简单起见,下面的代码被简化为基本内容。

What I am basically trying to do is read elements into arr in someClass from a file, but I want to make it so the file does not have to specify how many elements it contains (this is essential for the full project I am working on).我基本上想要做的是从文件someClass元素读入someClass arr中,但我想让它这样文件不必指定它包含多少个元素(这对于我正在处理的完整项目至关重要) . I obviously thought of the solution of allocating a new array with a size+1 , and then copying all the current elements into it and reading a new element, but I find that this would be a brutal waste of processing time and memory if it is possible to simply allocate the next memory address and use it for my array.我显然想到了分配一个size+1的新数组的解决方案,然后将所有当前元素复制到其中并读取一个新元素,但我发现如果是这样,这将是对处理时间和内存的残酷浪费可以简单地分配下一个内存地址并将其用于我的数组。 Is there a way to do that in C++?有没有办法在 C++ 中做到这一点?

To summarize, I want to read an element, allocate a new memory address for my array, and then read another element into that newly allocated memory address.总而言之,我想读取一个元素,为我的数组分配一个新的内存地址,然后将另一个元素读入新分配的内存地址。 Repeat until the file ends.重复直到文件结束。

I won't bore you with the reasons, but using a simple std::vector is not an option.我不会让你厌烦这些原因,但使用简单的std::vector不是一种选择。

Here is the code:这是代码:

class someClass {
public:
    int *arr;
    void Read(ifstream&,int&);
};

void someClass::Read(ifstream &inFile, int &size) {
    arr = new int[0];
    inFile.open("input.txt");

    int index = 0;
    int element;
    while (!inFile.eof()) {
        inFile >> element;
        *(arr + index) = element;
        index ++;
    }

    size = index;
    inFile.close();
}

int main() {

    int size;
    someClass a;
    ifstream inFile;

    a.Read(inFile,size);
    //obviously unnecessary, just for testing
    for(int i = 0; i < size; i ++) {
        cout << a.arr[i] << " ";
    }
    cout << endl;
}

I just liked the question and did some experiments myself, using MSVC14 compiler (optimizations disabled).我只是喜欢这个问题并自己做了一些实验,使用MSVC14编译器(禁用优化)。
C++11/14 has the following sequence containers (intentionally excluded dynarry introduced in C++14): C++11/14 具有以下序列容器(有意排除 C++14 中引入的dynarry ):

  1. No dynamic resizing (up to the programmer to allocate and deallocate)没有动态调整大小(由程序员分配和取消分配)
    • Raw array (eg int char[] )原始数组(例如int char[]
    • Array (eg new array<int, size>(){...} )数组(例如new array<int, size>(){...}
  2. With Dynamic resizing使用动态调整大小
    • Vector (consecutive memory allocation)向量(连续内存分配)
    • list (linked-list like array)列表(像数组一样的链表)
    • forward_list (similar to list) forward_list(类似于列表)
    • deque (double ended queue) deque(双端队列)

Let me start with your questions,让我从你的问题开始,

the solution of allocating a new array with a size+1, and then copying all the current elements into it and reading a new element, but I find that this would be a brutal waste of processing time and memory分配一个大小为+1的新数组,然后将所有当前元素复制到其中并读取一个新元素的解决方案,但我发现这将是对处理时间和内存的残酷浪费

You are right, but to mitigate the overhead, when you allocate memory to use and then you figure out you need more memory than allocated, you need to allocate new memory and copy the previous data, then free the previous allocated memory.您是对的,但是为了减轻开销,当您分配要使用的内存然后发现需要的内存多于分配的内存时,您需要分配新内存并复制先前的数据,然后释放先前分配的内存。
But wait!可是等等! How much to allocated (size+1 is bad)?分配多少(大小+1不好)? Each time you are forced to allocate bigger chunk of memory, you better allocate twice the size you had already in hand so that you reduce the probability of another memory reallocation;每次你被迫分配更大的内存块时,你最好分配两倍于你手头的大小,这样你就可以减少再次分​​配内存的可能性; because it is considered an extremely expensive operation.因为它被认为是一项极其昂贵的操作。

if it is possible to simply allocate the next memory address and use it for my array.如果可以简单地分配下一个内存地址并将其用于我的数组。 Is there a way to do that in C++?有没有办法在 C++ 中做到这一点?

It's not totally in your control as C++ runtime has implemented memory management functions.由于 C++ 运行时已实现内存管理功能,因此它并不完全由您控制。 Where your newly allocated memory will be, is not in your control, however sometimes it happens that the newly allocated space will have the same base address as the previous one;您新分配的内存将在哪里,不在您的控制范围内,但有时会发生新分配的空间与前一个空间具有相同的基地址; it depends on the runtime and the memory fragmentation it faces.这取决于运行时和它面临的内存碎片

I got some benchmarks using malloc and realloc functions borrowed from C. Here is the code:我使用从 C 中借来的mallocrealloc函数得到了一些基准测试。代码如下:

    auto start = chrono::steady_clock::now();

    auto partialSize = 100;
    auto arr = (int *) malloc(sizeof(int) * partialSize);


    for (auto i = 0; i < SIZE; i++) {
        arr[i] = i;
        if (i == partialSize - 1) {
            partialSize = partialSize << 1; // for 2X
            arr = (int *) realloc(arr, sizeof(int) * partialSize);
        }
    }

    auto duration = chrono::steady_clock::now() - start;

    free(arr);

    cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):结果(插入 100,000,000 个整数;时间是 3 次运行的平均值):

  • Start Size = 100, Increment Steps = 1.5X, Time(s) = 1.35s起始大小 = 100,增量步长 = 1.5X,时间 = 1.35s
  • Start Size = 100, Increment Steps = 2X, Time(s) = 0.65s起始大小 = 100,增量步长 = 2X,时间 = 0.65s
  • Start Size = 100, Increment Steps = 4X, Time(s) = 0.42s起始大小 = 100,增量步长 = 4X,时间 = 0.42s

  • Start Size = 10,000, Increment Steps = 1.5X, Time(s) = 0.96s起始大小 = 10,000,增量步长 = 1.5X,时间 = 0.96s

  • Start Size = 10,000, Increment Steps = 2X, Time(s) = 0.79s起始大小 = 10,000,增量步数 = 2X,时间 = 0.79s
  • Start Size = 10,000, Increment Steps = 4X, Time(s) = 0.51s起始大小 = 10,000,增量步数 = 4X,时间 = 0.51s

    Another case is using C++'s new keyword and checking for relocation:另一种情况是使用 C++ 的new关键字并检查重定位:

     auto start = chrono::steady_clock::now(); auto partialSize = 100; auto arr = new int[partialSize]; for (auto i = 0; i < SIZE; i++) { arr[i] = i; if (i == partialSize - 1) { auto newArr = new int[partialSize << 2]; // for 4X partialSize = partialSize << 2; arr = newArr; } } auto duration = chrono::steady_clock::now() - start; delete[] arr; cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):结果(插入 100,000,000 个整数;时间是 3 次运行的平均值):

  • Start Size = 100, Increment Steps = 1.5X, Time(s) = 0.63s起始大小 = 100,增量步长 = 1.5X,时间 = 0.63s
  • Start Size = 100, Increment Steps = 2X, Time(s) = 0.44s起始大小 = 100,增量步长 = 2X,时间 = 0.44s
  • Start Size = 100, Increment Steps = 4X, Time(s) = 0.36s起始大小 = 100,增量步长 = 4X,时间 = 0.36s

  • Start Size = 10,000, Increment Steps = 1.5X, Time(s) = 0.65s起始大小 = 10,000,增量步数 = 1.5X,时间 = 0.65s

  • Start Size = 10,000, Increment Steps = 2X, Time(s) = 0.52s起始大小 = 10,000,增量步数 = 2X,时间 = 0.52s
  • Start Size = 10,000, Increment Steps = 4X, Time(s) = 0.42s起始大小 = 10,000,增量步数 = 4X,时间 = 0.42s

For the rest (dynamic resizable containers):对于其余(动态可调整大小的容器):

auto start = chrono::steady_clock::now();

//auto arr = vector<int>{};
//auto arr = list<int>{};
//auto arr = new std::array<int, SIZE>{};
//auto arr = new int[SIZE];
//auto arr = deque<int>{};
auto arr = forward_list<int>{};

for (auto i = 0; i < SIZE; i++) {
    arr.push_front(i);
    // arr.push_back(i)
}

auto duration = chrono::steady_clock::now() - start;

cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):结果(插入 100,000,000 个整数;时间是 3 次运行的平均值):

  1. vector向量

    • Time(s) = 2.17s时间(s) = 2.17s
  2. list列表

    • Time(s) = 10.31s时间(s) = 10.31s
  3. array (no reallocation)数组(无重新分配)

    • Time(s) = N/A;时间(s) = N/A; Error: Compiler is out of heap.错误:编译器在堆外。
  4. raw int array (no reallocation)原始 int 数组(无重新分配)

    • Time(s) = 0.22s时间(s) = 0.22s
  5. deque双端队列

    • Time(s) = 3.47s时间(s) = 3.47s
  6. forward_list forward_list

    • Time(s) = 8.78s时间(s) = 8.78s

Hope it helps.希望能帮助到你。

arr = new int[1];
int capacity = 1, size = 0;
inFile.open("input.txt");

int element;
while (!inFile.eof()) {
    inFile >> element;
    if (size == capacity){
        capacity *= 2;
        int * newbuf = new int[capacity];
        std::copy_n(arr, size, newbuf);
        delete[] arr;
        arr = newbuf;
    }
    arr[size] = element;
    size++;
}

size = index;

inFile.close();

You can simulate what a std::vector does.您可以模拟std::vector作用。 Double the capacity every time it gets full.每次装满时容量翻倍。

You will need to make a new space for the bigger Array and move the old values:您需要为更大的数组腾出一个新空间并移动旧值:

void resize() {
    size_t newSize = size * 2;
    int* newArr = new int[newSize];
    memcpy( newArr, arr, size * sizeof(int) );
    size = newSize;
    delete [] arr;
    arr = newArr;
}

I will propose you a different solution, in which you would not need to copy every time the previously readed content of the file.我将向您提出一个不同的解决方案,您无需每次都复制先前阅读的文件内容。 The idea is to use a linked list, in which every element has a size equal to the double of its predecessor (another possibility is to make it grow like a Fibonacci series).这个想法是使用一个链表,其中每个元素的大小都等于其前一个元素的两倍(另一种可能性是让它像斐波那契数列一样增长)。 In this way, allocations become rarer and rarer as the size of the file grows and, in case you need it, you can free memory from the beginning of the file by freeing the first elements in the list.这样,随着文件大小的增长,分配变得越来越少,如果您需要它,您可以通过释放列表中的第一个元素来从文件的开头释放内存。 Of course, you pay more while reading since access is not sequential.当然,您在阅读时支付更多费用,因为访问不是按顺序进行的。 Here is an example code illustrating the idea:这是一个说明这个想法的示例代码:

struct buffer_list
{
    void append_next_chunk(size_t size, char * buff)
    {
        if(buffer == nullptr) {
            buffer = buff;
            local_size = size;
            return;
        }
        if(next == nullptr) next = new buffer_list();
        next->append_next_chunk(size, buff);
    }


    char read(int offset)
    {
        if(offset >= local_size) return next->read(offset-local_size);
        return buffer[offset];
    }
    buffer_list * next = nullptr;
    char *buffer = nullptr;
    size_t local_size = 0;
    ~buffer_list()
    {
        delete[] buffer;
        delete next;
    }
};


struct custom_vector
{
    custom_vector(const size_t size) {
        write_ptr = new char[size];
        inner_list.append_next_chunk(size, write_ptr);
        total_size = size;
        last_created_size = size;
    }


    void push_back(char c){
        if(written_size == total_size)
        {
            last_created_size *= 2;
            write_ptr = new char[last_created_size];
            write_offset = total_size;
            inner_list.append_next_chunk(last_created_size, write_ptr);
            total_size += last_created_size;
        }
        write_ptr[written_size - write_offset] = c;
        written_size++;
    }

    char read(int offset)
    {
        return inner_list.read(offset);
    }

    size_t size() { return written_size; }

    char * write_ptr = nullptr;
    buffer_list inner_list;
    size_t written_size = 0;
    size_t total_size = 0;
    size_t write_offset = 0;
    size_t last_created_size = 0;
}; 

On my machine the custom_vector performes way better than std::vector on write operations, while a big penalty is paid while reading.在我的机器上, custom_vector在写操作上的性能比std::vector好得多,而在读取时要付出很大的代价。 However i think that some optimizations for sequential read can be easily implemented solving the issue.但是,我认为可以轻松实现对顺序读取的一些优化来解决该问题。

You can read the number of elements in the file by counting the delimiters and then size the array to that.您可以通过计算分隔符来读取文件中的元素数量,然后将数组调整为该数量。 For example let's assume that your file is delimited by lines as:例如,假设您的文件由行分隔为:

1
2
3
4
5

You can count the number of lines in the file with the appropriate line separator.您可以使用适当的行分隔符计算文件中的行数。 On linux it can be done by:在 linux 上,它可以通过以下方式完成:

int elemCount = std::count(std::istreambuf_iterator<char>(inFile),
    std::istreambuf_iterator<char>(), '\n');
inFile.clear();
inFile.seekg(0, std::ios::beg);

Then you can allocate the array as:然后您可以将数组分配为:

arr = new int[elemCount];

If you are using space or tab delimited than change from '\\n' to ' ' or '\\t' or whatever.如果您使用空格或制表符分隔,请从'\\n'更改为' ''\\t'或其他任何内容。 You can then read in your information as before.然后,您可以像以前一样读入您的信息。 You may need to add or subtract 1 depending on your delimiter and how the file is built.根据您的分隔符和文件的构建方式,您可能需要加或减 1。 This is also a little dangerous as empty rows, double delimiters, etc could mess up the count.这也有点危险,因为空行、双分隔符等可能会弄乱计数。 If I did this I would fill the array with some default value and then remove them after reading to be sure all my values were good.如果我这样做,我会用一些默认值填充数组,然后在阅读后删除它们以确保我的所有值都正确。 This would require one resize after all the reading is done.这将需要在所有读取完成后调整大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM