简体   繁体   English

可分割数据结构(在c ++ 11中)

[英]splitable data structure (in c++11)

I wonder if anybody could help me out. 我想知道是否有人可以帮助我。

I look for a data structure (such as list, queue, stack, array, vector, binary tree etc.) supporting these four operations: 我寻找支持这四个操作的数据结构(如列表,队列,堆栈,数组,向量,二叉树等):

  • isEmpty (true/false) isEmpty (true / false)
  • insert single element 插入单个元素
  • pop (ie get&remove) single element pop (即获取和删除)单个元素
  • split into two structures eg take a approximately half (let's say +/- 20%) of elements and move them to another structure 分成两个结构,例如取大约一半(比如说+/- 20%)的元素并将它们移动到另一个结构

Note that I don't care about order of elements at all. 请注意 ,我根本不关心元素的顺序。

Insert/pop example: 插入/弹出示例:

A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5) // contains 1,2,3,4,5 in any order
A.pop() // 3
A.pop() // 2
A.pop() // 5
A.pop() // 1
A.pop() // 4

and the split example: 和拆分示例:

A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5)
A.split(B)
// A = {1,4,3}, B={2,5} in any order

I need the structure to be be fast as possible - preferably all four operations in O(1). 我需要结构尽可能快 - 最好是O(1)中的所有四个操作。 I doubt it have been already implemented in std so I will implement it by myself (in C++11, so std::move can be used). 我怀疑它已经在std中实现了所以我将自己实现它(在C ++ 11中,所以可以使用std::move )。

Note that insert , pop and isEmpty are called about ten times more frequently than split . 请注意insertpopisEmpty的调用频率是split的十倍。

I tried some coding with list and vector but with no success: 我尝试了一些带有listvector 编码 ,但没有成功:

#include <vector>
#include <iostream>

// g++ -Wall -g -std=c++11
/*
output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
5 6 7 8 9
*/

int main ()
{
        std::vector<int> v1;

        for (int i = 0; i < 10; ++i) v1.push_back(i);

        for (auto i : v1) std::cout << i << " ";
        std::cout << std::endl;

        auto halfway = v1.begin() + v1.size() / 2;
        auto endItr  = v1.end();

        std::vector<int> v2;
        v2.insert(v2.end(),
                std::make_move_iterator(halfway),
                std::make_move_iterator(endItr));

        // sigsegv
        /*
        auto halfway2 = v1.begin() + v1.size() / 2;
        auto endItr2  = v1.end();
        v2.erase(halfway2, endItr2);
        */

        for (auto i : v1) std::cout << i << " ";
        std::cout << std::endl;

        for (auto i : v2) std::cout << i << " ";
        std::cout << std::endl;

        return 0;
}

Any sample code, ideas, links or whatever useful? 任何示例代码,想法,链接或任何有用的? Thanks 谢谢

Related literature: 相关文献:

Your problems with the deletion aare due to a bug in your code. 由于代码中的错误导致删除问题。

// sigsegv
auto halfway2 = v1.begin() + v1.size() / 2;
auto endItr2  = v1.end();
v2.erase(halfway2, endItr2);

You try to erase from v2 with iterators pointing into v1 . 您尝试使用指向v1迭代器从v2擦除。 That won't work and you probably wanted to call erase on v1 . 那不行,你可能想在v1上调用erase

That fixes your deletion problem when splitting the vector, and vector seems to be the best container for what you want. 这解决了拆分向量时的删除问题,并且向量似乎是您想要的最佳容器。

Note that everything except split can be done in O(1) on a vector if you insert at the end only, but since order doesn't matter for you I don't see any problem with it, split would be O(n) in your implemention once you fixed it, but that should be pretty fast since the data is right next to eachother in the vector and that's very cache friendly. 注意除了split之外的所有东西都可以在向量上的O(1)中完成,如果你只在最后插入,但由于顺序对你没关系我没有看到任何问题,split将是O(n)在你的实现中,一旦你修复了它,但是这应该非常快,因为数据紧挨着向量中的每一个并且非常缓存友好。

I can't think of a solution with all operations in O(1). 我想不出O(1)中所有操作的解决方案。

With a list you can have push and pop in O(1), and split in O(n) (due to the fact that you need to find the middle of the list). 使用列表,您可以在O(1)中进行推送和弹出,并在O(n)中进行拆分(因为您需要找到列表的中间部分)。

With a balanced binary tree (not a search tree) you can have all operations in O(log n). 使用平衡二叉树 (不是搜索树),您可以将所有操作都放在O(log n)中。

edit 编辑

There have been some suggestions that keeping the middle of the list would produce O(1). 有一些建议,保持列表的中间将产生O(1)。 This is not the case as when you split the function you have to compute the middle of the left list and the middle of the right list resulting in O(n). 情况并非如此,因为当您分割函数时,您必须计算左侧列表的中间位置和右侧列表的中间位置,从而得到O(n)。

Some other suggestion is that a vector is preferred simply because it is cache-friendly. 其他一些建议是,矢量是首选,因为它是缓存友好的。 I totally agree with this. 我完全同意这种说法。

For fun, I implemented a balanced binary tree container that performs all operations in O(log n). 为了好玩,我实现了一个平衡的二叉树容器,它在O(log n)中执行所有操作。 The insert and pop are obviously in O(log n). insertpop显然在O(log n)中。 The actual split is in O(1), however we are left with the root node which we have to insert in one of the halves resulting in O(log n) for split also. 实际的分割是在O(1)中,但是我们留下了根节点,我们必须在其中一半中插入,导致split O(log n)。 No copying is involved however. 但是,不涉及复制。

Here is my attempt at the said container (I haven't thoroughly tested for correctness, and it can be further optimized (like transforming the recursion in a loop)). 这是我对所述容器的尝试(我没有彻底测试正确性,它可以进一步优化(如在循环中转换递归))。

#include <memory>
#include <iostream>
#include <utility>
#include <exception>

template <class T>
class BalancedBinaryTree {
  private:
    class Node;

    std::unique_ptr<Node> root_;

  public:
    void insert(const T &data) {
      if (!root_) {
        root_ = std::unique_ptr<Node>(new Node(data));
        return;
      }
      root_->insert(data);
    }

    std::size_t getSize() const {
      if (!root_) {
        return 0;
      }
      return 1 + root_->getLeftCount() + root_->getRightCount();
    }

    // Tree must not be empty!!
    T pop() {
      if (root_->isLeaf()) {
        T temp = root_->getData();
        root_ = nullptr;
        return temp;
      }
      return root_->pop()->getData();
    }

    BalancedBinaryTree split() {
      if (!root_) {
        return BalancedBinaryTree();
      }

      BalancedBinaryTree left_half;
      T root_data = root_->getData();
      bool left_is_bigger = root_->getLeftCount() > root_->getRightCount();

      left_half.root_ = std::move(root_->getLeftChild());
      root_ = std::move(root_->getRightChild());

      if (left_is_bigger) {
        insert(root_data);
      } else {
        left_half.insert(root_data);
      }

      return std::move(left_half);
    }
};


template <class T>
class BalancedBinaryTree<T>::Node {
  private:
    T data_;
    std::unique_ptr<Node> left_child_, right_child_;
    std::size_t left_count_ = 0;
    std::size_t right_count_ = 0;

  public:
    Node() = default;
    Node(const T &data, std::unique_ptr<Node> left_child = nullptr,
         std::unique_ptr<Node> right_child = nullptr)
        : data_(data), left_child_(std::move(left_child)),
         right_child_(std::move(right_child)) {
    }

    bool isLeaf() const {
      return left_count_ + right_count_ == 0;
    }

    const T& getData() const {
      return data_;
    }
    T& getData() {
      return data_;
    }

    std::size_t getLeftCount() const {
      return left_count_;
    }

    std::size_t getRightCount() const {
      return right_count_;
    }

    std::unique_ptr<Node> &getLeftChild() {
      return left_child_;
    }
    const std::unique_ptr<Node> &getLeftChild() const {
      return left_child_;
    }
    std::unique_ptr<Node> &getRightChild() {
      return right_child_;
    }
    const std::unique_ptr<Node> &getRightChild() const {
      return right_child_;
    }

    void insert(const T &data) {
      if (left_count_ <= right_count_) {
        ++left_count_;
        if (left_child_) {
          left_child_->insert(data);
        } else {
          left_child_ = std::unique_ptr<Node>(new Node(data));
        }
      } else {
        ++right_count_;
        if (right_child_) {
          right_child_->insert(data);
        } else {
          right_child_ = std::unique_ptr<Node>(new Node(data));
        }
      }
    }

    std::unique_ptr<Node> pop() {
      if (isLeaf()) {
        throw std::logic_error("pop invalid path");
      }

      if (left_count_ > right_count_) {
        --left_count_;
        if (left_child_->isLeaf()) {
          return std::move(left_child_);
        }
        return left_child_->pop();
      }

      --right_count_;
      if (right_child_->left_count_ == 0 && right_child_->right_count_ == 0) {
        return std::move(right_child_);
      }
      return right_child_->pop();
    }
};

usage: 用法:

  BalancedBinaryTree<int> t;
  BalancedBinaryTree<int> t2;

  t.insert(3);
  t.insert(7);
  t.insert(17);
  t.insert(37);
  t.insert(1);

  t2 = t.split();

  while (t.getSize() != 0) {
    std::cout << t.pop() << " ";
  }
  std::cout << std::endl;

  while (t2.getSize() != 0) {
    std::cout << t2.pop() << " ";
  }
  std::cout << std::endl;

output: 输出:

1 17
3 37 7

If the number of elements/bytes stored at any one time in your container is large, the solution of Youda008 (using a list and keeping track of the middle) may not be as efficient as you hope. 如果容器中任何时候存储的元素/字节数很大,Youda008的解决方案(使用列表并跟踪中间)可能不如您希望的那样高效。

Alternatively, you could have a list<vector<T>> or even list<array<T,Capacity>> and keep track of the middle of the list, ie split only between two sub-containers, but never split a sub-container. 或者,你可以有一个list<vector<T>>甚至list<array<T,Capacity>> 跟踪列表的中间位置,即仅在两个子容器之间拆分,但从不拆分子容器。 This should give you both O(1) on all operations and reasonable cache efficiency. 这应该为您提供所有操作的O(1)和合理的缓存效率。 Use array<T,Capacity> if a single value for Capacity serves your needs at all times (for Capacity=1 , this reverts to an ordinary list ). 使用array<T,Capacity>如果对于单个值Capacity在任何时候提供您的需要( Capacity=1 ,这将恢复为一个普通的list )。 Otherwise, use vector<T> and adapt the capacity for new vectors according to demand. 否则,使用vector<T>并根据需要调整新向量的容量。

bolov 's points out correctly that finding the middles of the lists emerging from splitting one list is not O(1). 博洛夫正确地指出,找到分裂列表中出现的列表的中间部分不是O(1)。 This implies that keeping track of the middle is not useful. 这意味着跟踪中间是没有用的。 However, using a list<sub_container> is still faster than list , because the split only costs O(n/ Capacity ) not O(n) . 但是,使用list<sub_container>仍然比列表更快 ,因为拆分仅花费O(n / Capacity )而不是O(n) The price you pay for this is that the split has a graininess of Capacity rather than 1. Thus, you must compromise between the accuracy and cost of a split. 您为此付出的代价是分割具有Capacity而不是1的颗粒度。因此,您必须在分割的准确性和成本之间进行折衷。

Another option is to implement own container using a linked list and a pointer to that middle element, at which you want to split it. 另一种选择是使用链接列表和指向中间元素的指针来实现自己的容器,在该元素中要分割它。 This pointer will be updated on every modifying operation. 此指针将在每次修改操作时更新。 This way you can achieve O(1) complexicity on all operations. 这样,您就可以在所有操作上实现O(1)复杂性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM