简体   繁体   English

是否有任何类似数组的数据结构可以在两侧增大?

[英]Is there any array-like data structure that can grow in size on both sides?

I'm a student working on a small project for an high performance computing course, hence efficiency it's a key issue. 我是一名从事高性能计算课程的小型项目的学生,因此效率是一个关键问题。

Let say that I have a vector of N floats and I want to remove the smallest n elements and the biggest n elements. 假设我有一个N浮点数的向量,我想删除最小的n个元素和最大的n个元素。 There are two simple ways of doing this: 有两种简单的方法可以做到这一点:

A 一种

sort in ascending order    // O(NlogN)
remove the last n elements // O(1)
invert elements order      // O(N)
remove the last n elements // O(1)

B

sort in ascending order     // O(NlogN)
remove the last n elements  // O(1)
remove the first n elements // O(N)

In A inverting the elements order require swapping all the elements, while in B removing the first n elements require moving all the others to occupy the positions left empty. 在A中反转元素顺序需要交换所有元素,而在B中删除前n个元素需要移动所有其他元素以占据留空的位置。 Using std::remove would give the same problem. 使用std :: remove会产生同样的问题。

If I could remove the first n elements for free then solution B would be cheaper. 如果我可以免费删除前n个元素,那么解决方案B会更便宜。 That should be easy to achieve, if instead of having a vector, ie an array with some empty space after vector::end() , I would have a container with some free space also before vector::begin() . 这应该很容易实现,如果不是有一个向量,即在vector::end()之后有一些空格的数组,我会在vector::begin()之前有一个带有一些空闲空间的容器。

So the question is: does exist already an array-like (ie contiguous memory, no linked lists) in some libraries (STL, Boost) that allows for O(1) inserting/removing on both sides of the array? 所以问题是:在某些库(STL,Boost)中是否存在类似数组 (即连续内存,没有链表),允许在数组的两侧插入/删除O(1)?

If not, do you think that there are better solutions than creating such a data structure? 如果没有,您是否认为有更好的解决方案而不是创建这样的数据结构?

Have you thought of using std::partition with a custom functor like the example below: 您是否考虑过将std::partition与自定义函数一起使用,如下例所示:

#include <iostream>
#include <vector>
#include <algorithm>

template<typename T>
class greaterLess {
    T low;
    T up;
  public:
    greaterLess(T const &l, T const &u) : low(l), up(u) {}
    bool operator()(T const &e) { return !(e < low || e > up); }
};

int main()
{
    std::vector<double> v{2.0, 1.2, 3.2, 0.3, 5.9, 6.0, 4.3};
    auto it = std::partition(v.begin(), v.end(), greaterLess<double>(2.0, 5.0));
    v.erase(it, v.end());

    for(auto i : v) std::cout << i << " ";
    std::cout << std::endl;

    return 0;
}

This way you would erase elements from your vector in O(N) time. 这样你就可以在O(N)时间内从矢量中删除元素。

Try boost::circular_buffer : 尝试boost :: circular_buffer

It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms. 它支持随机访问迭代器,缓冲区开头或结尾的恒定时间插入和擦除操作以及与std算法的互操作性。

Having looked at the source , it seems (and is only logical) that data is kept as a continuous memory block. 看过源代码后 ,似乎(并且只是逻辑上)数据被保存为连续的内存块。

The one caveat is that the buffer has fixed capacity and after exhausting it elements will get overwritten. 需要注意的是缓冲区具有固定的容量,在耗尽之后,元素将被覆盖。 You can either detect such cases yourself and resize the buffer manually, or use boost::circular_buffer_space_optimized with a humongous declared capacity, since it won't allocate it if not needed. 您可以自己检测此类情况并手动调整缓冲区大小,也可以使用具有巨大声明容量的boost::circular_buffer_space_optimized ,因为如果不需要它将不会分配它。

To shrink & grow a vector at both ends, you can use idea of slices, reserving extra memory to expand into ahead of time at front and back, if efficient growth is needed. 要在两端缩小和增长矢量,您可以使用切片的想法,如果需要有效的增长,可以预留额外的内存以提前扩展到正面和背面。

Simply, make a class with not only a length but indices for first & last elements and a suitably sized vector, to create a window of data on the underlying block of stored floats. 简单地说,创建一个不仅具有长度而且包含第一个和最后一个元素的索引以及适当大小的向量的类,以在存储的浮动的底层块上创建数据窗口。 A C++ class can provide inlined functions, for things like deleting items, address into the array, find the nth largest value, shift the slice values down or up to insert new elements maintaining sorted order. C ++类可以提供内联函数,例如删除项目,地址到数组中,找到第n个最大值,向下或向上移动切片值以插入维护排序顺序的新元素。 Should no spare elements be available, then dynamic allocation of a new larger float store, permits continuing growth at the cost of an array copy. 如果没有备用元素可用,那么动态分配新的大型浮动存储区将允许以阵列副本为代价继续增长。

A circular buffer is designed as a FIFO, with new elements added at end, removal at front, and not allowing insertion in the middle, a self defined class can also (trivially) support array subscript values different from 0..N-1 循环缓冲区设计为FIFO,在末尾添加新元素,在前面删除,不允许在中间插入,自定义类也可以(平凡地)支持不同于0..N-1的数组下标值

Due to memory locality, avoiding excessive indirection due to pointer chains, and the pipelining of subscript calculations on a modern processor, a solution based on an array (or a vector), is likely to be most efficicent, despite element copying on insertion. 由于存储器局部性,避免了由于指针链导致的过度间接,以及现代处理器上的下标计算的流水线操作,基于数组(或矢量)的解决方案可能是最有效的,尽管插入时元素复制。 Deque would be suitable but it fails to guarantee contiguous storage. Deque适合但不能保证连续存储。

Additional supplementary info. 其他补充信息。 Researching classes providing slices, finds some plausible alternatives to evaluate : 研究提供切片的类,找到一些合理的替代方案来评估:

A) std::slice which uses slice_arrays B) Boost Class Range A)使用slice_arrays的std :: slice B)Boost Class Range

Hope this is the kind of specific information you were hoping for, in general a simpler clearer solution is more maintainable, than a tricky one. 希望这是您希望的那种具体信息,一般来说,更简单明了的解决方案比一个棘手的解决方案更易于维护。 I would expect slices and ranges on sorted data sets, being quite common, for example filtering experimental data where "outliers" are excluded as faulty readings. 我希望排序数据集上的切片和范围非常常见,例如过滤实验数据,其中“异常值”被排除为错误读数。

I think a good solution, should actually be - O(NlogN), 2xO(1), with any binary searches O(logN +1) for filtering on outlying values, in place of deleting a fixed number of small or large values; 我认为一个好的解决方案,实际应该是 - O(NlogN),2xO(1),任何二进制搜索O(logN +1)用于过滤外围值,而不是删除固定数量的小值或大值; it matters that the "O" is relatively fast to, sometimes an O(1) algorithmn can be in practice slower for practical values of N than an O(N) one. 重要的是“O”相对较快,有时O(1)算法实际上对于N的实际值比O(N)算法慢。

as a complementary to @40two 's answer, before partitioning the array, you will need to find the partitioning pivot, which is you will need to find the nth smallest number, and the nth greatest number in an unsorted array. 作为对@ 40two的回答的补充,在对数组进行分区之前,您需要找到分区枢轴,您将需要找到第n个最小数字,以及未排序数组中的第n个最大数字。 There is a discussion on that in SO: How to find the kth largest number in unsorted array 在SO中有一个讨论: 如何在未排序的数组中找到第k个最大的数字

There are several algorithms to solve this problem. 有几种算法可以解决这个问题。 Some are deterministic O(N) - on of them is a variation on finding the median (median of medians). 一些是确定性的O(N) - 它们是找到中位数(中位数的中位数)的变化。 There are some non-deterministic algorithms with O(N) average-case. 存在一些具有O(N)平均情况的非确定性算法。 A good source book to find those algorithms is Introduction to algorithms . 找到这些算法的一本好的资料手册是算法简介 Also in books like 也像书一样

So eventually, your code will run in an O(N) time 所以最终,你的代码将在O(N)时间内运行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM