[英]Multithreaded Insertion in Binary Search Tree (BST)
I have a task of inserting multiple elements into a BST. 我有一个将多个元素插入BST的任务。 The task has to be optimized by use of multiple threads.
必须通过使用多个线程来优化任务。 There is no limit on how many threads can be launched.
可以启动多少个线程没有限制。
Here is my approach. 这是我的方法。 This is a theoretical approach.
这是一种理论方法。 I have not tried implementing it and have no idea to what extent it will work.
我没有尝试过它,也不知道它的工作程度。 Please suggest your opinions on this idea.
请提出您对此想法的意见。
The BST Node will look something like this: BST节点看起来像这样:
class BSTNode {
int val;
BSTNode left, right;
boolean leftLock, rightLock;
Queue<BSTNode> leftQ, rightQ;
}
I am not using any Java locks, rather using two boolean variables to denote the state of lock. 我没有使用任何Java锁,而是使用两个布尔变量来表示锁定状态。
Since insertion of any element first requires us to find the relevant position, this task can be carried out in parallel, since this will not modify the tree. 由于任何元素的插入首先要求我们找到相关位置,因此该任务可以并行执行,因为这不会修改树。 The task can only work till the point a node on the path is unlocked.
该任务只能在路径上的节点解锁之前工作。 If a particular node is locked, then that particular insertion thread is put to
sleep()
and added to the corresponding left or right queue and waked up again when lock is released. 如果特定节点被锁定,则该特定插入线程将被置于
sleep()
并添加到相应的左或右队列,并在释放锁定时再次唤醒。
On the other hand, if none of the nodes on the path have a lock on them, we can proceed ahead with insertion. 另一方面,如果路径上没有节点锁定它们,我们可以继续插入。 Before insertion and modifying the corresponding pointer of parent, a lock must be acquired on the parent.
在插入和修改父对应的指针之前,必须在父对象上获取锁。
Can anyone suggest their views on this implementation method? 有人可以就这种实施方法提出自己的看法吗?
This is really an exercise in trying to implement your own lock. 这实际上是尝试实现自己的锁定的练习。 What you've done is created an unpacked lock (boolean, waitingQueue).
你所做的是创建一个解包锁(boolean,waitingQueue)。 But, the only way this approach would work safely is if you externally synchronize access to the 'lock' boolean variables and queues.
但是,这种方法安全工作的唯一方法是外部同步访问'lock'布尔变量和队列。 So to make this non-lock code work successfully you'd have to use a lock.
因此,要使此非锁定代码成功运行,您必须使用锁定。
If you didn't use a lock you would have several problems relating to concurrency: 如果你没有使用锁,你会遇到几个与并发有关的问题:
If you want a thread-safe, ordered, mutable container with sub-linear search times use ConcurrentSkipListSet<Integer> 如果您想要一个具有子线性搜索时间的线程安全,有序,可变容器,请使用ConcurrentSkipListSet <Integer>
An interesting problem. 一个有趣的问题。 And there are some points where you have to redefine in your approach.
还有一些要点,你必须重新定义你的方法。
Since insertion of any element first requires us to find the relevant position, this task can be carried out in parallel
由于任何元素的插入首先要求我们找到相关位置,因此该任务可以并行执行
finding a suitable place + actual insertion
with a single operation insert . finding a suitable place + actual insertion
。 The reason is that, in a situation where one thread (say t1) has finished finding a suitable place
and then try actual insertion
, but has to hold on because another thread (say t2) is doing the actual insertion
, then the first thread (t1) will have to do the place calculation again because the tree has changed after the second thread's (t2) actual insertion
. finding a suitable place
然后尝试actual insertion
,但由于另一个线程(比如t2)正在进行actual insertion
,然后第一个线程( t1)将不得不再次进行位置计算,因为树在第二个线程(t2) actual insertion
后已经改变。 (I think you got what I say) So in conclusion, parallel insertion for a Binary Search Tree would not benefit you, since A Binary Search Tree Insertion cannot be carried out independently from another insertion. 因此,总之,二进制搜索树的并行插入不会对您有利 ,因为二进制搜索树插入不能独立于另一个插入执行。
I am trying to explain only 1 problem which shows the basic loophole in this approach. 我试图解释只有一个问题,它显示了这种方法的基本漏洞。
Assume that you are supporting only insert operation for now and not any other operations. 假设您现在只支持插入操作而不支持任何其他操作。 Following could be an implementation for the insert operation:
以下可能是插入操作的实现:
//Using C
BSTNode* insert(BSTNode* root,int value)
{
1 if(root == NULL){
2 return createNewNode(value);
3 }
4
5 if(root->data == value){
6 return root;
7 }
8 else if(root->data > value){
9 while(root->leftLock);
10 if(!root->left){
11 root->leftLock = true;
12 root->left = insert(root->left,value);
13 root->leftLock = false;
14 }
15 else{
16 root->left = insert(root->left,value);
17 }
18 }
19 else{
20 while(root->rightLock);
21 if(!root->right){
22 root->rightLock = true;
23 root->right = insert(root->right,value);
24 root->rightLock = false;
25 }
26 else{
27 root->right = insert(root->right,value);
28 }
29 }
30
31 return root;
32
}
In this approach, since only the children of the last node (leaf node) will get updated upon inserting a value, So we are not doing any locking while updating the parents (when recurring back). 在这种方法中,由于只有最后一个节点(叶节点)的子节点在插入值时才会更新,因此我们在更新父节点时(重复返回时)不会进行任何锁定。
I am avoiding insertion request queuing and using spinlocks only to keep it a little simple. 我避免插入请求排队和使用自旋锁只是为了保持它有点简单。 However the point i am gonna raise will be same for that case too...
然而,我要提出的观点也是如此......
Consider this BST: 考虑一下这个BST:
10
/ \
5 15
/ \ / \
2 6 13 20
Suppose 2 threads t1 and t2 are invoked simultaneously trying to insert values 25 and 26 respectively and currently are at BSTNode with value 20. (The rightmost node). 假设同时调用2个线程t1和t2,试图分别插入值25和26,并且当前位于BSTNode,值为20.(最右边的节点)。
Now lets execute the above code with context switching between the threads: 现在让我们通过线程之间的上下文切换来执行上面的代码:
a. t1:
1. if(root == NULL) //not true, will go to line 5.
//switch
b. t2:
1. if(root == NULL) //not true, will go to line 5.
//switch
c. t1:
5. if(root->data == value){ //not true, will go to line 8.
8. else if(root->data > value) //not true, will go to line 19.
//switch
d. t2:
5. if(root->data == value){ //not true, will go to line 8.
//switch
e. t1:
19 else{
20 while(root->rightLock); // lock is not held by anyone, so continue.
21 if(!root->right){
//switch
f. t2:
8. else if(root->data > value) //not true, will go to line 19.
19 else{
20 while(root->rightLock); // lock is not helpd by anyone, so continue.
21 if(!root->right){
22 root->rightLock = true;
//switch
g. t1:
22 root->rightLock = true;
23 root->right = insert(root->right,value);
//switch
h. t2:
23 root->right = insert(root->right,value);
24 root->rightLock = false;
//switch
Assume that line 23 covers complete execution of that line. 假设第23行涵盖了该行的完整执行。
As you can see in section f , g and h that both t1 and t2 are entering into critical section without knowing the presence of each other. 正如您在f , g和h部分所见, t1和t2都进入临界区而不知道彼此的存在。 The code was not supposed to allow that.
代码不应该允许这样做。
Whats the problem then ??? 什么问题呢?
The problem is that there is a piece of code which was supposed to be executed in one go: 问题是有一段代码应该一次性执行:
20 while(root->rightLock);
21 if(!root->right){
22 root->rightLock = true;
So we may need some hardware control by making our own uninterruptible instruction which executes all 3 tasks mentioned above together. 因此,我们可能需要一些硬件控制,通过制作我们自己的不间断指令来执行上面提到的所有3个任务。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.