简体   繁体   English

C++中数组和向量的大小限制

[英]Size limit of arrays and vectors in c++

I want to store pow(10,12) integer numbers in an array.我想将 pow(10,12) 整数存储在一个数组中。 I know integer array can only store upto pow(10,7) integers.So what should I do now?Also, Can vector be used for this purpose?我知道整数数组最多只能存储 pow(10,7) 个整数。那么我现在该怎么办?另外,向量可以用于此目的吗? In that case what is the size limit of vector?在那种情况下,向量的大小限制是多少? Apart from arrays and vector, if there is any other way to accomplish this.除了数组和向量,如果有任何其他方法可以实现这一点。

Additional detail : compiler= TDM-GCC 4.9.2 64-bit Release.附加细节:编译器= TDM-GCC 4.9.2 64 位版本。

The most obvious limitation is not the language, it is the computer .最明显的限制不是语言,而是计算机

10 12 int -egers would take 4 terabytes of memory . 10 12 int -egers 将占用 4 TB 的内存 Do you have access to an expensive supercomputer with that much RAM ?你有这么大内存的昂贵的超级计算机吗? They typically cost millions of dollars or € ....它们通常花费数百万美元或€....

If you do, you'll probably can use some heap-allocated std::vector<int> or std::array<int,1000*1000*1000*1000> (at least on some x86-64 Linux supercomputer system).如果这样做,您可能可以使用一些堆分配的std::vector<int>std::array<int,1000*1000*1000*1000> (至少在某些 x86-64 Linux 超级计算机系统上)。

But you probably don't : your computer have much less than 4 terabytes (4096 gigabytes) of RAM;但您可能没有:您的计算机的 RAM 远小于 4 TB(4096 GB); if it is a desktop it might have a few dozen gigabytes at most.如果是台式机,最多可能有几十 GB。 TDM-GCC is for Windows, and all supercomputers are today running some variant of Linux. TDM-GCC适用于 Windows,如今所有的超级计算机都运行 Linux 的某个变体。

BTW, while heap memory is in virtual memory , in practice you'll experiment trashing if you allocate (like suggested by sehe's answer ) a terabyte data on a computer with only gigabytes of RAM.顺便说一句,而堆内存中的虚拟内存,在实践中,你会尝试 捣毁如果分配(如所建议sehe的回答与RAM仅千兆字节)的计算机上TB的数据。 See mmap(2) & madvise(2) on Linux.请参阅 Linux 上的mmap(2)madvise(2)

Perhaps you might access that array in chunks.也许您可以分块访问该数组。 Then consider storing the chunks on some database (maybe using PostGreSQL or Sqlite ) or perhaps a big binary file.然后考虑将块存储在某个数据库(可能使用PostGreSQLSqlite )或一个大的二进制文件中。 You'll need a large disk space to fit the 4Tbyte requirement.您将需要大的磁盘空间来满足 4Tbyte 的要求。

BTW, if you handle that much data, I strongly recommend to learn and use Linux on your machine and code for Linux, since all supercomputers and cloud clusters are Linux based.顺便说一句,如果你处理这么多数据,我强烈建议你在你的机器上学习和使用 Linux,并为 Linux 编写代码,因为所有的超级计算机和云集群都是基于 Linux 的。 You could prototype your software on your Linux laptop or desktop for a small amount of data (eg 8Gbytes), then port it to some cloud or supercomputer (which will cost you big bucks).您可以在 Linux 笔记本电脑或台式机上为少量数据(例如 8GB)制作软件原型,然后将其移植到某些云或超级计算机(这将花费您大笔资金)。

I'm not saying you should do this.我不是说你应该这样做。 But you could leverage your system's VMM (all OS-es have it).但是您可以利用系统的 VMM(所有操作系统都有)。

Here's a simple sample that allocates the file (3.7TB) - it doesn't actually write the blocks unless your filesystem doesn't support sparse files.这是一个分配文件 (3.7TB) 的简单示例 - 除非您的文件系统不支持稀疏文件,否则它实际上不会写入块。

It then proceeds to write 5 random values at random indices in your array.然后继续在数组中的随机索引处写入 5 个随机值。

On most systems, this will end up writing max.在大多数系统上,这将最终写入最大。 5 4k blocks to disk, while the file is actually 3.7TB. 5 个 4k 块到磁盘,而文件实际上是 3.7TB。 The operating system will deal with swapping the pages in and out on demand and writing changes back to disk.操作系统将处理按需交换页面和将更改写回磁盘。

Live On Coliru 在 Coliru

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <iostream>
#include <random>     // for random writes only
#include <functional> // for random writes only

int main() {
    const size_t N = 1000000000000ull;

    int fd = open("large.db", O_RDWR|O_CREAT, 0777);

    if (fd==-1)
        perror("opening");

    if (-1==fallocate64(fd, 0, sizeof(int)*N, 1))
        perror("fallocate");

    int* data = (int*) mmap64(nullptr, sizeof(int)*N, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

    if (data && data!=MAP_FAILED) {
        auto randindex = std::bind(std::uniform_int_distribution<size_t>(0, N-1), std::mt19937{ std::random_device{} () });

        for(int i=0; i<5; ++i) 
            data[randindex()] = rand();
    } else {
        perror("mmap");
    }

    if (data && munmap(data, sizeof(int)*N))
        perror("munmap");

    close(fd);
}

Inspecting the resulting large.db with eg od large.db on linux will show the changed data actually persisted to disk.在 linux 上使用例如od large.db检查生成的large.db将显示更改的数据实际上已保存到磁盘。

¹ Coliru limits don't allow this (obviously) ¹ Coliru 限制不允许这样做(显然)

"I want to store pow(10,12) integer numbers in an array. I know integer array can only store upto pow(10,7) integers.So what should I do now?"

Create a wrapper class, that contains an array of MAX_LENGTH arrays.创建一个包含 MAX_LENGTH 数组的包装类。

When the User wants to access the (MAX_LENGTH + 10) position, or (2 * MAX_LENGTH + 11) position, your access method simply does the math to figure out which array has the correct range of values, then do the get/set on it.当用户想要访问 (MAX_LENGTH + 10) 位置或 (2 * MAX_LENGTH + 11) 位置时,您的访问方法简单地进行数学运算以确定哪个数组具有正确的值范围,然后执行 get/set它。

If not every element in this SuperArray(tm) will be populated, then perhaps use a sparse vector/array?如果不是这个 SuperArray(tm) 中的每个元素都会被填充,那么也许使用稀疏向量/数组?

Caveat: your system might not have the RAM+swap to support it, and you will end up doing some fun exotic to-filesystem-files<==>RAM solution, or to a database table(s), or ???警告:你的系统可能没有 RAM+swap 来支持它,你最终会做一些有趣的异国文件系统文件<==>RAM 解决方案,或者数据库表,或者???

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM