在std :: vector C中存储许多元素

Question

For one of my applications I need to generate vector of size 2^35 (the size of my RAM is 96 GB, so this vector can easily fit into RAM). 对于我的一个应用程序，我需要生成大小为2 ^ 35的向量（我的RAM的大小为96 GB，因此该向量可以轻松放入RAM中）。

int main ()
{
  int i;

  /* initialize random seed: */
  srand (time(NULL));

  vector<int> vec;
  do {
     i = rand() % 10 + 1;
     vec.push_back(i);
  } while ((vec.size()*sizeof(int))<pow(2,35));

  return 0;
}

However, I notice that my do while loop executes infinitely. 但是，我注意到我的do while循环无限执行。 One of the possible reasons is range of vec.size() is long unsigned int, which is very less than the number of elements inserted ie pow(2,35) , due to which I think it goes in an infinite loop. 可能的原因之一是vec.size()范围是long unsigned int，这远远小于插入的元素数量pow(2,35) ，因此我认为它会陷入无限循环。 I may be wrong. 我可能是错的。 Please correct me if I am wrong. 如果我错了，请纠正我。 But can someone please tell how can I insert greater than pow(2,35) numbers in vec. 但是有人可以告诉我如何在vec中插入大于pow(2,35)数字。

gcc version:4.8.2 gcc版本：4.8.2

Answer 1

I'll try to address some of your problems in a simple solution: 我将尝试通过一个简单的解决方案来解决您的一些问题：

First problem you have is space. 您遇到的第一个问题是空间。 Since you need numbers from 1-10 only, a int8_t would serve you much better. 由于您只需要1-10之间的数字，因此int8_t将为您提供更好的服务。

Second is speed. 第二是速度。 std::vector does a lot of allocations and reallocations behind the hood. std::vector在后台进行了大量的分配和重新分配。 Since you have a fixed size, In my opinion there's no need to use it. 由于您的尺寸固定，因此我认为不需要使用它。 Knowing this, we'll use a simple array and threads to improve performance. 知道了这一点，我们将使用一个简单的数组和线程来提高性能。

Here's the code: 这是代码：

#include <array>
#include <random>
#include <thread>
#include <cstdint>
#include <memory>
#include <chrono>

// Since you only need numbers from 1-10, a single byte will work nicely.
const uint64_t size = UINT64_C(0x800000000); // Exactly 2^35
typedef std::array<int8_t, size> vec_t;

// start is first element, end is one-past the last. This is a template so we can generate multiple functions.
template<unsigned s>
void fill(vec_t::iterator start, vec_t::iterator end) {
    static const int seed = std::chrono::system_clock::now().time_since_epoch().count()*(s+1);
    static std::default_random_engine generator(seed);
    static std::uniform_int_distribution<int8_t> distribution(1,10);
    for(auto it = start; it != end; ++it) {
        *it = distribution(generator);  // generates number in the range 1..10
    }
}

int main() {
    auto vec = std::unique_ptr<vec_t>(new vec_t());

    // Each will have its own generator and distribution.
    std::thread a(fill<0>, vec->begin(), vec->begin() + size/4);
    std::thread b(fill<1>, vec->begin() + size/4, vec->begin() + size/2);
    std::thread c(fill<2>, vec->begin() + size/2, vec->begin() + (size/4)*3);
    std::thread d(fill<3>, vec->begin() + (size/4)*3, vec->end());
    a.join();
    b.join();
    c.join();
    d.join();
    return 0;
}

Answer 2

Why can't you use constructor? 为什么不能使用构造函数？

std::vector<int> vec ( number_of_elements );

That way you'll have memory reserved, then you can randomize elements using generate or something. 这样，您将保留内存，然后可以使用generate或其他方法随机化元素。

Answer 3

Update 更新资料

As Baum mit Augen has highlighted, this post doesn't really answer the question because in his platform condition 4 doesn't hold ( sizeof(std::size_t) is actually 8 ). 正如Baum mit Augen所强调的那样，此帖子并未真正回答问题，因为在他的平台条件下4不成立（ sizeof(std::size_t)实际上是8 ）。 However, I leave this post here to highlight an issue that might occur when porting the code. 但是，我将这篇文章留在此处以重点介绍移植代码时可能发生的问题。

Original post 原始帖子

One problem that I see is the following. 我看到的一个问题如下。 Let's assume (most platforms fulfill these assumptions) that 让我们假设（大多数平台都满足这些假设）

1) vec.size returns std::size_t (not guaranteed); 1） vec.size返回std::size_t （不保证）;

2) sizeof returns std::size_t (guaranteed); 2） sizeof返回std::size_t （保证）；

3) std::size_t is an unsigned integer type (guaranteed); 3） std::size_t是无符号整数类型（保证）；

4) sizeof(std::size_t) == 4 (not guaranteed); 4） sizeof(std::size_t) == 4 （不保证）;

5) CHAR_BIT == 8 (not guaranteed). 5） CHAR_BIT == 8 （不保证）。

(Recall that CHAR_BIT is the number of bits in a char .) （回想一下CHAR_BIT是char的位数。）

Therefore, the type of vec.size()*sizeof(int) is std::size_t and its maximum value is 2^(sizeof(std::size_t)*CHAR_BIT) - 1 == 2^32 - 1 < 2^32 < 2^35 . 因此， vec.size()*sizeof(int)为std::size_t ，其最大值为2^(sizeof(std::size_t)*CHAR_BIT) - 1 == 2^32 - 1 < 2^32 < 2^35 。 Therefore, vec.size()*sizeof(int) is always smaller than 2^35 . 因此， vec.size()*sizeof(int)始终小于2^35 。

在std :: vector C中存储许多元素

问题描述

3 个解决方案

解决方案1
2 2015-04-23 12:43:39

解决方案2
2 2015-04-23 12:55:25

解决方案3
1 2015-04-23 12:36:46

在std :: vector C中存储许多元素

问题描述

3 个解决方案

解决方案1 2 2015-04-23 12:43:39

解决方案2 2 2015-04-23 12:55:25

解决方案3 1 2015-04-23 12:36:46

解决方案1
2 2015-04-23 12:43:39

解决方案2
2 2015-04-23 12:55:25

解决方案3
1 2015-04-23 12:36:46