简体   繁体   English

二进制搜索树(BST)中的多线程插入

[英]Multithreaded Insertion in Binary Search Tree (BST)

I have a task of inserting multiple elements into a BST. 我有一个将多个元素插入BST的任务。 The task has to be optimized by use of multiple threads. 必须通过使用多个线程来优化任务。 There is no limit on how many threads can be launched. 可以启动多少个线程没有限制。

Here is my approach. 这是我的方法。 This is a theoretical approach. 这是一种理论方法。 I have not tried implementing it and have no idea to what extent it will work. 我没有尝试过它,也不知道它的工作程度。 Please suggest your opinions on this idea. 请提出您对此想法的意见。

The BST Node will look something like this: BST节点看起来像这样:

class BSTNode {
    int val;
    BSTNode left, right;
    boolean leftLock, rightLock;
    Queue<BSTNode> leftQ, rightQ;
}

I am not using any Java locks, rather using two boolean variables to denote the state of lock. 我没有使用任何Java锁,而是使用两个布尔变量来表示锁定状态。

Since insertion of any element first requires us to find the relevant position, this task can be carried out in parallel, since this will not modify the tree. 由于任何元素的插入首先要求我们找到相关位置,因此该任务可以并行执行,因为这不会修改树。 The task can only work till the point a node on the path is unlocked. 该任务只能在路径上的节点解锁之前工作。 If a particular node is locked, then that particular insertion thread is put to sleep() and added to the corresponding left or right queue and waked up again when lock is released. 如果特定节点被锁定,则该特定插入线程将被置于sleep()并添加到相应的左或右队列,并在释放锁定时再次唤醒。

On the other hand, if none of the nodes on the path have a lock on them, we can proceed ahead with insertion. 另一方面,如果路径上没有节点锁定它们,我们可以继续插入。 Before insertion and modifying the corresponding pointer of parent, a lock must be acquired on the parent. 在插入和修改父对应的指针之前,必须在父对象上获取锁。

Can anyone suggest their views on this implementation method? 有人可以就这种实施方法提出自己的看法吗?

This is really an exercise in trying to implement your own lock. 这实际上是尝试实现自己的锁定的练习。 What you've done is created an unpacked lock (boolean, waitingQueue). 你所做的是创建一个解包锁(boolean,waitingQueue)。 But, the only way this approach would work safely is if you externally synchronize access to the 'lock' boolean variables and queues. 但是,这种方法安全工作的唯一方法是外部同步访问'lock'布尔变量和队列。 So to make this non-lock code work successfully you'd have to use a lock. 因此,要使此非锁定代码成功运行,您必须使用锁定。

If you didn't use a lock you would have several problems relating to concurrency: 如果你没有使用锁,你会遇到几个与并发有关的问题:

  • There is no happens-before relationship between setting any of the values in the node. 在设置节点中的任何值之间没有发生之前的关系。 That is, none of the other threads may see updated values for any of the fields. 也就是说,其他任何线程都不会看到任何字段的更新值。 This alone could cause all sorts of trouble. 仅这一点就可能引发各种麻烦。 However, there are more concrete examples. 但是,还有更具体的例子。
  • No thread knows whether the assignment of the boolean lock was because it changed the value or any number of other threads changed the value (a race condition). 没有线程知道布尔锁的赋值是因为它改变了值还是任何数量的其他线程改变了值(竞争条件)。 Essentially, no thread would know whether it 'owns' the lock. 基本上,没有线程会知道它是否“拥有”锁。 There is a fix for this using a built in class but there are enough other problems this isn't worth persuing. 使用内置类可以解决这个问题,但还有其他问题,这是不值得坚持的。
  • There is another race condition between checking the lock and inserting yourself into the queue. 检查锁定并将自己插入队列之间还有另一种竞争条件。 One thread may see that another thread 'has' the lock (which is dubious given the second point), and add itself to the queue. 一个线程可能会看到另一个线程'拥有'锁(给定第二个点是可疑的),并将自己添加到队列中。 But by the time it adds itself to the queue the lock may be unlocked and it may wait infinite time if no other threads touch that part of the tree. 但是当它将自己添加到队列时,锁可以被解锁,并且如果没有其他线程接触树的那一部分,它可以等待无限时间。
  • Poor performance. 表现不佳。 Each thread can only view on node at a time. 每个线程一次只能在节点上查看。 Even if you converted the boolean/queue constructs to locks you're likely not going to have good performance because even search() type operations on the tree are going to require using locks to ensure the correct memory visibility and happens-before relationships. 即使您将布尔/队列构造转换为锁定,您也可能无法获得良好的性能,因为即使树上的search()类型操作也需要使用锁来确保正确的内存可见性和之前发生的关系。

If you want a thread-safe, ordered, mutable container with sub-linear search times use ConcurrentSkipListSet<Integer> 如果您想要一个具有子线性搜索时间的线程安全,有序,可变容器,请使用ConcurrentSkipListSet <Integer>

An interesting problem. 一个有趣的问题。 And there are some points where you have to redefine in your approach. 还有一些要点,你必须重新定义你的方法。

Since insertion of any element first requires us to find the relevant position, this task can be carried out in parallel 由于任何元素的插入首先要求我们找到相关位置,因此该任务可以并行执行

  • Actually this is wrong. 其实这是错的。 It is true that it will not be modifying the tree, but since there are some threads in background who are trying to modify this tree (insertion of a node), you have to apply a Lock/semaphore here. 确实它不会修改树,但由于后台有一些线程试图修改这个树(插入一个节点),你必须在这里应用一个锁/信号量。
  • And you have to do finding a suitable place + actual insertion with a single operation insert . 而且您必须使用单个操作插入物 finding a suitable place + actual insertion The reason is that, in a situation where one thread (say t1) has finished finding a suitable place and then try actual insertion , but has to hold on because another thread (say t2) is doing the actual insertion , then the first thread (t1) will have to do the place calculation again because the tree has changed after the second thread's (t2) actual insertion . 原因是,在一个线程(比如t1) finding a suitable place然后尝试actual insertion ,但由于另一个线程(比如t2)正在进行actual insertion ,然后第一个线程( t1)将不得不再次进行位置计算,因为树在第二个线程(t2) actual insertion后已经改变。 (I think you got what I say) (我想你得到我所说的)

So in conclusion, parallel insertion for a Binary Search Tree would not benefit you, since A Binary Search Tree Insertion cannot be carried out independently from another insertion. 因此,总之,二进制搜索树的并行插入不会对有利 ,因为二进制搜索树插入不能独立于另一个插入执行。

I am trying to explain only 1 problem which shows the basic loophole in this approach. 我试图解释只有一个问题,它显示了这种方法的基本漏洞。

Assume that you are supporting only insert operation for now and not any other operations. 假设您现在只支持插入操作而不支持任何其他操作。 Following could be an implementation for the insert operation: 以下可能是插入操作的实现:

//Using C
BSTNode* insert(BSTNode* root,int value)
{
1    if(root == NULL){
2       return createNewNode(value);
3    }
4    
5    if(root->data == value){
6        return root;
7    } 
8    else if(root->data > value){
9        while(root->leftLock);
10        if(!root->left){
11            root->leftLock = true;
12            root->left = insert(root->left,value);
13            root->leftLock = false;
14        }
15        else{
16            root->left = insert(root->left,value);
17        }
18    }
19    else{
20        while(root->rightLock);
21        if(!root->right){
22            root->rightLock = true;
23            root->right = insert(root->right,value);
24            root->rightLock = false;
25        }
26        else{
27            root->right = insert(root->right,value);
28        }
29    }
30    
31    return root;
32    
}

In this approach, since only the children of the last node (leaf node) will get updated upon inserting a value, So we are not doing any locking while updating the parents (when recurring back). 在这种方法中,由于只有最后一个节点(叶节点)的子节点在插入值时才会更新,因此我们在更新父节点时(重复返回时)不会进行任何锁定。

I am avoiding insertion request queuing and using spinlocks only to keep it a little simple. 我避免插入请求排队和使用自旋锁只是为了保持它有点简单。 However the point i am gonna raise will be same for that case too... 然而,我要提出的观点也是如此......

Consider this BST: 考虑一下这个BST:

    10
   /  \
  5    15
 / \  /  \
2   6 13  20

Suppose 2 threads t1 and t2 are invoked simultaneously trying to insert values 25 and 26 respectively and currently are at BSTNode with value 20. (The rightmost node). 假设同时调用2个线程t1和t2,试图分别插入值25和26,并且当前位于BSTNode,值为20.(最右边的节点)。

Now lets execute the above code with context switching between the threads: 现在让我们通过线程之间的上下文切换来执行上面的代码:

a. t1:
          1. if(root == NULL)  //not true, will go to line 5.
          //switch

b. t2:
          1. if(root == NULL)  //not true, will go to line 5.
          //switch

c. t1:
          5. if(root->data == value){  //not true, will go to line 8.
          8. else if(root->data > value) //not true, will go to line 19.
          //switch

d. t2:
          5. if(root->data == value){  //not true, will go to line 8.
          //switch

e. t1:
          19    else{
          20        while(root->rightLock);  // lock is not held by anyone, so continue.
          21        if(!root->right){
          //switch
f. t2:
          8. else if(root->data > value) //not true, will go to line 19.
          19    else{
          20        while(root->rightLock);  // lock is not helpd by anyone, so continue.
          21        if(!root->right){
          22            root->rightLock = true;
          //switch

g. t1:
          22            root->rightLock = true;
          23            root->right = insert(root->right,value);
          //switch

h. t2: 
          23            root->right = insert(root->right,value);
          24            root->rightLock = false;
          //switch

Assume that line 23 covers complete execution of that line. 假设第23行涵盖了该行的完整执行。

As you can see in section f , g and h that both t1 and t2 are entering into critical section without knowing the presence of each other. 正如您在fgh部分所见, t1t2都进入临界区而不知道彼此的存在。 The code was not supposed to allow that. 代码不应该允许这样做。

Whats the problem then ??? 什么问题呢?

The problem is that there is a piece of code which was supposed to be executed in one go: 问题是有一段代码应该一次性执行:

20        while(root->rightLock);
21        if(!root->right){
22            root->rightLock = true;

So we may need some hardware control by making our own uninterruptible instruction which executes all 3 tasks mentioned above together. 因此,我们可能需要一些硬件控制,通过制作我们自己的不间断指令来执行上面提到的所有3个任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM