简体   繁体   English

如何在安全的Rust中表达相互递归的数据结构?

[英]How do I express mutually recursive data structures in safe Rust?

I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:我正在尝试在 Rust 中实现类似场景图的数据结构。我想要一个等效于用安全Rust 表示的这个 C++ 代码:

struct Node
{
    Node* parent; // should be mutable, and nullable (no parent)
    std::vector<Node*> child;

    virtual ~Node() 
    { 
        for(auto it = child.begin(); it != child.end(); ++it)
        {
            delete *it;
        }
    }

    void addNode(Node* newNode)
    {
        if (newNode->parent)
        {
            newNode->parent.child.erase(newNode->parent.child.find(newNode));
        }
        newNode->parent = this;
        child.push_back(newNode);
    }
}

Properties I want:我想要的属性:

  • the parent takes ownership of its children父母拥有其孩子的所有权
  • the nodes are accessible from the outside via a reference of some kind节点可以通过某种引用从外部访问
  • events that touch one Node can potentially mutate the whole tree触及一个Node的事件可能会改变整棵树

Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Rust 试图通过禁止你做可能不安全的事情来确保内存安全。 Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.由于此分析是在编译时执行的,因此编译器只能对已知安全的操作子集进行推理。

In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe ).在锈,你可以很容易地将父的引用(借用母公司,从而防止突变)子节点的名单(通过拥有它们,给你更多的自由),但不能两者都(不使用unsafe )。 This is especially problematic for your implementation of addNode , which requires mutable access to the given node's parent.这对于您的addNode实现addNode ,它需要对给定节点的父节点进行可变访问。 However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.但是,如果您将parent指针存储为可变引用,那么,由于一次可能只能使用对特定对象的单个可变引用,因此访问父节点的唯一方法是通过子节点,并且您将只能有一个子节点,否则你会有两个对同一个父节点的可变引用。

If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.如果你想避免不安全的代码,有很多选择,但它们都需要一些牺牲。


The easiest solution is to simply remove the parent field.最简单的解决方案是简单地删除parent字段。 We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.我们可以定义一个单独的数据结构来在遍历树时记住节点的父节点,而不是将其存储在节点本身中。

First, let's define Node :首先,让我们定义Node

#[derive(Debug)]
struct Node<T> {
    data: T,
    children: Vec<Node<T>>,
}

impl<T> Node<T> {
    fn new(data: T) -> Node<T> {
        Node { data: data, children: vec![] }
    }

    fn add_child(&mut self, child: Node<T>) {
        self.children.push(child);
    }
}

(I added a data field because a tree isn't super useful without data at the nodes!) (我添加了一个data字段,因为如果节点没有数据,树就不是很有用!)

Let's now define another struct to track the parent as we navigate:现在让我们定义另一个结构来在我们导航时跟踪父级:

#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
    node: &'a Node<T>,
    parent: Option<&'a NavigableNode<'a, T>>,
}

impl<'a, T> NavigableNode<'a, T> {
    fn child(&self, index: usize) -> NavigableNode<T> {
        NavigableNode {
            node: &self.node.children[index],
            parent: Some(self)
        }
    }
}

impl<T> Node<T> {
    fn navigate<'a>(&'a self) -> NavigableNode<T> {
        NavigableNode { node: self, parent: None }
    }
}

This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around).如果您在导航时不需要改变树,并且您可以保留父NavigableNode对象,则此解决方案工作正常(这对于递归算法工作正常,但如果您想存储NavigableNode效果不佳在其他一些数据结构中并保留它)。 The second restriction can be alleviated by using something other than a borrowed pointer to store the parent;第二个限制可以通过使用借用指针以外的东西来存储父对象来缓解; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Box es, Rc 's, etc.如果你想要最大的通用性,你可以使用Borrow trait来允许直接值、借用指针、 Box es、 Rc等。


Now, let's talk about zippers .现在,让我们谈谈拉链 In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure.在函数式编程中,zippers 用于“关注”数据结构的特定元素(列表、树、映射等),以便访问该元素需要恒定的时间,同时仍保留该数据结构的所有数据。 If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.如果您需要导航树并在导航期间对其进行变异,同时保留向上导航树的能力,那么您可以将树变成拉链并通过拉链执行修改。

Here's how we could implement a zipper for the Node defined above:下面是我们如何为上面定义的Node实现拉链:

#[derive(Debug)]
struct NodeZipper<T> {
    node: Node<T>,
    parent: Option<Box<NodeZipper<T>>>,
    index_in_parent: usize,
}

impl<T> NodeZipper<T> {
    fn child(mut self, index: usize) -> NodeZipper<T> {
        // Remove the specified child from the node's children.
        // A NodeZipper shouldn't let its users inspect its parent,
        // since we mutate the parents
        // to move the focused nodes out of their list of children.
        // We use swap_remove() for efficiency.
        let child = self.node.children.swap_remove(index);

        // Return a new NodeZipper focused on the specified child.
        NodeZipper {
            node: child,
            parent: Some(Box::new(self)),
            index_in_parent: index,
        }
    }

    fn parent(self) -> NodeZipper<T> {
        // Destructure this NodeZipper
        let NodeZipper { node, parent, index_in_parent } = self;

        // Destructure the parent NodeZipper
        let NodeZipper {
            node: mut parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        } = *parent.unwrap();

        // Insert the node of this NodeZipper back in its parent.
        // Since we used swap_remove() to remove the child,
        // we need to do the opposite of that.
        parent_node.children.push(node);
        let len = parent_node.children.len();
        parent_node.children.swap(index_in_parent, len - 1);

        // Return a new NodeZipper focused on the parent.
        NodeZipper {
            node: parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        }
    }

    fn finish(mut self) -> Node<T> {
        while let Some(_) = self.parent {
            self = self.parent();
        }

        self.node
    }
}

impl<T> Node<T> {
    fn zipper(self) -> NodeZipper<T> {
        NodeZipper { node: self, parent: None, index_in_parent: 0 }
    }
}

To use this zipper, you need to have ownership of the root node of the tree.要使用此拉链,您需要拥有树的根节点的所有权。 By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes.通过取得节点的所有权,拉链可以移动事物以避免复制或克隆节点。 When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self , but I thought it was clearer that way, plus it lets you chain method calls).当我们移动一个拉链时,我们实际上会放下旧的拉链并创建一个新的拉链(虽然我们也可以通过改变self做到这一点,但我认为这样更清晰,而且它可以让你链接方法调用)。


If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children.如果以上选项都不令人满意,并且必须绝对将节点的父节点存储在节点中,那么下一个最佳选择是使用Rc<RefCell<Node<T>>>来引用父节点和Weak<RefCell<Node<T>>>给孩子们。Rc enables shared ownership, but adds overhead to perform reference counting at runtime.Rc启用共享所有权,但增加了在运行时执行引用计数的开销。 RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. RefCell支持内部可变性,但会增加开销以在运行时跟踪活动借用。 Weak is like Rc , but it doesn't increment the reference count; Weak就像Rc ,但它不会增加引用计数; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak.这用于中断引用循环,这将防止引用计数降至零,从而导致内存泄漏。 See DK.'s answer for an implementation using Rc , Weak and RefCell .有关使用RcWeakRefCell的实现,请参阅 DK. 的答案

The problem is that this data structure is inherently unsafe;问题是这种数据结构本质上是不安全的; it doesn't have a direct equivalent in Rust that doesn't use unsafe .没有在鲁斯特的直接等同不使用unsafe This is by design.这是设计使然。

If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it.如果你想把它翻译成安全的 Rust 代码,你需要更具体地说明你到底想从中得到什么。 I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't ."我知道你在上面列出了一些属性,但是经常有人来到 Rust 会说“我想要我在这个 C/C++ 代码中拥有的一切”,直接回答是“好吧,你不能”

You're also, unavoidably , going to have to change how you approach this.你也不可避免地不得不改变你处理这个问题的方式。 The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles;您给出的示例具有没有任何所有权语义、可变别名和循环的指针; all of which Rust will not allow you to simply ignore like C++ does. Rust 不允许您像 C++ 那样简单地忽略所有这些。

The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path).最简单的解决方案是去掉parent指针,并在外部维护它(如文件系统路径)。 This also plays nicely with borrowing because there are no cycles anywhere:这也很适合借用,因为在任何地方都没有循环:

pub struct Node1 {
    children: Vec<Node1>,
}

If you need parent pointers, you could go half-way and use Ids instead:如果您需要父指针,您可以中途使用 Ids 代替:

use std::collections::BTreeMap;

type Id = usize;

pub struct Tree {
    descendants: BTreeMap<Id, Node2>,
    root: Option<Id>,
}

pub struct Node2 {
    parent: Id,
    children: Vec<Id>,
}

The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses. BTreeMap是您的“地址空间”,通过直接使用内存地址来绕过借用和别名问题。

Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer.当然,这引入了给定Id未绑定到特定树的问题,这意味着它所属的节点可能会被破坏,现在您拥有了一个有效的悬空指针。 But, that's the price you pay for having aliasing and mutation.但是,这就是您为别名和突变付出的代价。 It's also less direct.它也不太直接。

Or, you could go whole-hog and use reference-counting and dynamic borrow checking:或者,您可以全力以赴并使用引用计数和动态借用检查:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);

pub type WeakNode3 = Weak<RefCell<Node3>>;

pub struct Node3_ {
    parent: Option<WeakNode3>,
    children: Vec<Node3>,
}

impl Node3 {
    pub fn add(&self, node: Node3) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        (node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
        self.children.push(node);
    }
}

Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references.在这里,您将使用Node3在树的各个部分之间转移节点的所有权,并使用WeakNode3作为外部引用。 Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.或者,你可以把Node3可复制的,并在加回逻辑add ,以确保给定节点不小心留错了父母的孩子。

This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking.严格来说,这并不比第二个选项好,因为这种设计绝对不能从静态借用检查中受益。 The second one can at least prevent you from mutating the graph from two places at once at compile time;第二个至少可以防止你在编译时一次从两个地方改变图形; here, if that happens, you'll just crash.在这里,如果发生这种情况,你就会崩溃。

The point is: you can't just have everything .关键是:你不能拥有一切 You have to decide which operations you actually need to support.您必须决定您实际需要支持哪些操作。 At that point, it's usually just a case of picking the types that give you the necessary properties.在这一点上,通常只是选择为您提供必要属性的类型。

In certain cases, you can also use an arena .在某些情况下,您还可以使用arena An arena guarantees that values stored in it will have the same lifetime as the arena itself.竞技场保证存储在其中的值将与竞技场本身具有相同的生命周期。 This means that adding more values will not invalidate any existing lifetimes, but moving the arena will.这意味着添加更多值不会使任何现有生命周期失效,但移动竞技场会。 Thus, such a solution is not viable if you need to return the tree.因此,如果您需要返回树,这样的解决方案是不可行的。

This solves the problem by removing the ownership from the nodes themselves.这通过从节点本身中删除所有权来解决问题。

Here's an example that also uses interior mutability to allow a node to be mutated after it is created.这是一个示例,它也使用内部可变性来允许节点在创建后进行变异。 In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.在其他情况下,如果树被构造一次然后简单地导航,您可以删除这种可变性。

use std::{
    cell::{Cell, RefCell},
    fmt,
};
use typed_arena::Arena; // 1.6.1

struct Tree<'a, T: 'a> {
    nodes: Arena<Node<'a, T>>,
}

impl<'a, T> Tree<'a, T> {
    fn new() -> Tree<'a, T> {
        Self {
            nodes: Arena::new(),
        }
    }

    fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
        self.nodes.alloc(Node {
            data,
            tree: self,
            parent: Cell::new(None),
            children: RefCell::new(Vec::new()),
        })
    }
}

struct Node<'a, T: 'a> {
    data: T,
    tree: &'a Tree<'a, T>,
    parent: Cell<Option<&'a Node<'a, T>>>,
    children: RefCell<Vec<&'a Node<'a, T>>>,
}

impl<'a, T> Node<'a, T> {
    fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
        let child = self.tree.new_node(data);
        child.parent.set(Some(self));
        self.children.borrow_mut().push(child);
        child
    }
}

impl<'a, T> fmt::Debug for Node<'a, T>
where
    T: fmt::Debug,
{
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{:?}", self.data)?;
        write!(f, " (")?;
        for c in self.children.borrow().iter() {
            write!(f, "{:?}, ", c)?;
        }
        write!(f, ")")
    }
}

fn main() {
    let tree = Tree::new();
    let head = tree.new_node(1);
    let _left = head.add_node(2);
    let _right = head.add_node(3);

    println!("{:?}", head); // 1 (2 (), 3 (), )
}

TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. TL;DR:DK. 的第二个版本无法编译,因为 parent 的类型不是 self.0,通过将其转换为 WeakNode 来修复它。 Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.此外,在正下方的行中,“self”没有“children”属性,但 self.0 有。


I corrected the version of DK.我修正了DK的版本。 so it compiles and works.所以它编译和工作。 Here is my Code:这是我的代码:

dk_tree.rs dk_tree.rs

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);


pub struct WeakNode(Weak<RefCell<Node_>>);

struct Node_ {
    parent: Option<WeakNode>,
    children: Vec<Node>,
}

impl Node {
    pub fn new() -> Self {
        Node(Rc::new(RefCell::new(Node_ {
            parent: None,
            children: Vec::new(),
        })))
    }
    pub fn add(&self, node: Node) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
        self.0.borrow_mut().children.push(node);
    }
    // just to have something visually printed
    pub fn to_str(&self) -> String {
        let mut result_string = "[".to_string();
        for child in self.0.borrow().children.iter() {
            result_string.push_str(&format!("{},", child.to_str()));
        }
        result_string += "]";
        result_string
    }
}

and then the main function in main.rs:然后是main.rs中的main function:

mod dk_tree;

use crate::dk_tree::{Node};


fn main() {
    let root = Node::new();
    root.add(Node::new());
    root.add(Node::new());
    let inner_child = Node::new();
    inner_child.add(Node::new());
    inner_child.add(Node::new());
    root.add(inner_child);
    let result = root.to_str();
    println!("{result:?}");
}

The reason I made the WeakNode be more like the Node is to have an easier conversion between the both我让 WeakNode 更像 Node 的原因是为了更容易地在两者之间进行转换

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM