在OCaml中建立二分搜寻树的正确方法

Question

Ok, I have written a binary search tree in OCaml. 好的，我已经在OCaml中编写了一个binary search tree 。

type 'a bstree = 
    |Node of 'a * 'a bstree * 'a bstree
    |Leaf


let rec insert x = function
    |Leaf -> Node (x, Leaf, Leaf)
    |Node (y, left, right) as node -> 
        if x < y then
            Node (y, insert x left, right)
        else if x > y then
            Node (y, left, insert x right)
        else
            node

The above code was said to be good in The right way to use a data structure in OCaml 在使用OCaml中的数据结构的正确方法中，上述代码被认为是很好的

However, I found a problem. 但是，我发现了一个问题。 This insert will only work when building a bst from a list in one go, such as 仅当一次性通过列表构建bst时，此insert才会起作用，例如

let rec set_of_list = function
     [] > empty
   | x :: l > insert x (set_of_list l);;

So if we build a bst from a list continuously, no problem, we can get a complete bst which has all nodes from the list. 因此，如果我们从列表中连续构建一个bst，没问题，我们可以获得一个包含列表中所有节点的完整bst。

However, if I have a bst built previously and now I wish to insert a node, then the resulting bst won't have complete nodes from the previous tree , am I right? 但是，如果我以前建立了一个bst，现在又想插入一个节点， 那么生成的bst将没有前一棵树的完整节点 ， 对吗？

Then how should I write a bst in OCaml so that we create a new bst with all nodes from previous tree to keep the previous tree immutable? 那么我应该如何在OCaml中写一个bst，以便我们用前一棵树的所有节点创建一个新的bst，以保持前一棵树不可变？ If every time I need to copy all nodes from old bst, will that impact the performance? 如果每次我需要从旧的bst复制所有节点，这会影响性能吗？

Edit: 编辑：

So let's say initially, a bst is created with one node t1 = (10, Leaf, Leaf) . 因此，假设一开始，使用一个节点t1 = (10, Leaf, Leaf)创建一个bst。

then I do let t2 = insert 5 t1 , then I get t2 = (10, (5, Leaf, Leaf), Leaf) , right? 然后我let t2 = insert 5 t1 ，然后得到t2 = (10, (5, Leaf, Leaf), Leaf) ，对吗？ inside t2, let's give a variable c1 to the child node (5, Leaf, Leaf) 在t2内，我们给c1 to the child node (5, Leaf, Leaf)一个变量c1 to the child node (5, Leaf, Leaf)

then I do let t5 = insert 12 t2 , then I get t3 = (10, (5, Leaf, Leaf), (15, Leaf, Leaf)) . 然后我let t5 = insert 12 t2 ，然后得到t3 = (10, (5, Leaf, Leaf), (15, Leaf, Leaf)) 。 let's give a variable c2 to the child node (5, Leaf, Leaf) 让我们给c2 to the child node (5, Leaf, Leaf)一个变量c2 to the child node (5, Leaf, Leaf)

So my question is whether c1 == c2 ? 所以我的问题是c1 == c2吗？ Are the two (5, Leaf, Leaf) s in t2 and t3 exactly the same? t2和t3中的两个(5, Leaf, Leaf)是否完全相同？

Answer 1

I'll try to answer the sharing part of your question. 我将尝试回答您问题的分享部分。 The short answer is yes, the two parts of the two trees will be identical. 简短的答案是肯定的，两棵树的两个部分将是相同的。 The reason immutable data works so well is that there are no limitations on the possible sharing. 不变数据之所以如此有效，是因为对可能的共享没有任何限制。 That's why FP works so well. 这就是FP如此出色的原因。

Here's a session that does what you describe: 这是一个按照您描述的会话：

# let t1 = Node (10, Leaf, Leaf);;
val t1 : int bstree = Node (10, Leaf, Leaf)
# let t2 = insert 5 t1;;
val t2 : int bstree = Node (10, Node (5, Leaf, Leaf), Leaf)
# let t3 = insert 12 t2;;
val t3 : int bstree = Node (10, Node (5, Leaf, Leaf), Node (12, Leaf, Leaf))
# let Node (_, c1, _) = t2;;
val c1 : int bstree = Node (5, Leaf, Leaf)
# let Node (_, c2, _) = t3;;
val c2 : int bstree = Node (5, Leaf, Leaf)
# c1 == c2;;
- : bool = true

The long answer is that there's no guarantee that the two parts will be identical. 长答案是，不能保证两个部分都相同。 If the compiler and/or runtime can see a reason to copy a subtree, it's also free to do that. 如果编译器和/或运行时可以看到复制子树的原因，则也可以这样做。 There are cases (as in distributed processing) where that would be a better choice. 在某些情况下（如在分布式处理中），这将是一个更好的选择。 Again the great thing about FP is that there are no limitations on sharing, which means that sharing is neither required nor forbidden in such cases. FP的另一个优点是共享没有任何限制，这意味着在这种情况下既不需要也不禁止共享。

Answer 2

Look at the accepted answer to the linked question. 查看链接问题的已接受答案。 To be specific this line here: 具体来说，此行：

let tree_of_list l = List.fold_right insert l Leaf 让tree_of_list l = List.fold_right插入l叶子

Work out the chain of what is happening. 弄清楚正在发生的事情。 Take the list 1,2,3. 取列表1,2,3。

First we have no tree and the result of insert 1 Leaf. 首先，我们没有树，插入1 Leaf的结果。

call this T1 称为T1

Next is the tree generated by insert 2 T1 接下来是插入2 T1生成的树

call this T2 称为T2

Then the tree generated by insert 3 T2 然后将树插入3 T2生成

This is what is returned as the result of Tree_of_list. 这是Tree_of_list的结果返回的结果。

If we call the result T3 then somewhere else in code call insert 4 T3 there is no difference in the result returned from insert than in calling Tree_of_list with the list 1,2,3,4. 如果我们调用结果T3，则在代码调用插入4 T3的其他地方，从插入返回的结果与调用带有列表1,2,3,4的Tree_of_list的结果没有区别。

在OCaml中建立二分搜寻树的正确方法

问题描述

2 个解决方案

解决方案1
4 2013-01-22 16:16:15

解决方案2
2 已采纳 2013-01-22 15:23:36

在OCaml中建立二分搜寻树的正确方法

问题描述

2 个解决方案

解决方案1 4 2013-01-22 16:16:15

解决方案2 2 已采纳 2013-01-22 15:23:36

解决方案1
4 2013-01-22 16:16:15

解决方案2
2 已采纳 2013-01-22 15:23:36