简体   繁体   English

在Rust中挣扎着关闭和生命

[英]Struggling with closures and lifetimes in Rust

I'm trying to port a little benchmark from F# to Rust. 我正在尝试从F#移植一个小基准到Rust。 The F# code looks like this: F#代码如下所示:

let inline iterNeighbors f (i, j) =
  f (i-1, j)
  f (i+1, j)
  f (i, j-1)
  f (i, j+1)

let rec nthLoop n (s1: HashSet<_>) (s2: HashSet<_>) =
  match n with
  | 0 -> s1
  | n ->
      let s0 = HashSet(HashIdentity.Structural)
      let add p =
        if not(s1.Contains p || s2.Contains p) then
          ignore(s0.Add p)
      Seq.iter (fun p -> iterNeighbors add p) s1
      nthLoop (n-1) s0 s1

let nth n p =
  nthLoop n (HashSet([p], HashIdentity.Structural)) (HashSet(HashIdentity.Structural))

(nth 2000 (0, 0)).Count

It computes the nth-nearest neighbor shells from an initial vertex in a potentially infinite graph. 它从潜在无限图中的初始顶点计算第n个最近邻壳。 I used something similar during my PhD to study amorphous materials. 我在博士期间使用了类似的东西研究无定形材料。

I've spent many hours trying and failing to port this to Rust. 我花了很多时间尝试并且没有将它移植到Rust。 I have managed to get one version working but only by manually inlining the closure and converting the recursion into a loop with local mutables (yuk!). 我设法让一个版本工作,但只能通过手动内联闭包并将递归转换为带有本地变量的循环(yuk!)。

I tried writing the iterNeighbors function like this: 我试着像这样编写iterNeighbors函数:

use std::collections::HashSet;

fn iterNeighbors<F>(f: &F, (i, j): (i32, i32)) -> ()
where
    F: Fn((i32, i32)) -> (),
{
    f((i - 1, j));
    f((i + 1, j));
    f((i, j - 1));
    f((i, j + 1));
}

I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. 我认为这是一个接受闭包的函数(它本身接受一对并返回单位)和一对并返回单位。 I seem to have to double bracket things: is that correct? 我似乎必须加倍支架:这是正确的吗?

I tried writing a recursive version like this: 我尝试编写这样的递归版本:

fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
    if n == 0 {
        return &s1;
    } else {
        let mut s0 = HashSet::new();
        for &p in s1 {
            if !(s1.contains(&p) || s2.contains(&p)) {
                s0.insert(p);
            }
        }
        return &nthLoop(n - 1, s0, s1);
    }
}

Note that I haven't even bothered with the call to iterNeighbors yet. 请注意,我还没有打扰到iterNeighbors的调用。

I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. 我想我正在努力让参数的生命周期正确,因为它们在递归调用中被旋转。 How should I annotate the lifetimes if I want s2 to be deallocated just before the return s and I want s1 to survive either when returned or into the recursive call? 如果我希望在return s之前释放s2并且我希望s1在返回时或者在递归调用中生存时,我应该如何注释生命周期?

The caller would look something like this: 调用者看起来像这样:

fn nth<'a>(n: i32, p: (i32, i32)) -> &'a HashSet<(i32, i32)> {
    let s0 = HashSet::new();
    let mut s1 = HashSet::new();
    s1.insert(p);
    return &nthLoop(n, &s1, s0);
}

I gave up on that and wrote it as a while loop with mutable locals instead: 我放弃了,并将其作为一个带有可变本地的while循环编写:

fn nth<'a>(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
    let mut n = n;
    let mut s0 = HashSet::new();
    let mut s1 = HashSet::new();
    let mut s2 = HashSet::new();
    s1.insert(p);
    while n > 0 {
        for &p in &s1 {
            let add = &|p| {
                if !(s1.contains(&p) || s2.contains(&p)) {
                    s0.insert(p);
                }
            };
            iterNeighbors(&add, p);
        }
        std::mem::swap(&mut s0, &mut s1);
        std::mem::swap(&mut s0, &mut s2);
        s0.clear();
        n -= 1;
    }
    return s1;
}

This works if I inline the closure by hand, but I cannot figure out how to invoke the closure. 如果我手动内联闭包,这是有效的,但我无法弄清楚如何调用闭包。 Ideally, I'd like static dispatch here. 理想情况下,我想在这里静态发送。

The main function is then: main功能是:

fn main() {
    let s = nth(2000, (0, 0));
    println!("{}", s.len());
}

So... what am I doing wrong? 那么......我做错了什么? :-) :-)

Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). 另外,我只在F#中使用了HashSet ,因为我认为Rust不提供具有有效集合理论操作(并集,交集和差异)的纯函数Set Am I correct in assuming that? 我认为这是正确的吗?

I seem to have to double bracket things: is that correct? 我似乎必须加倍支架:这是正确的吗?

No: the double bracketes are because you've chosen to use tuples and calling a function that takes a tuple requires creating the tuple first, but one can have closures that take multiple arguments, like F: Fn(i32, i32) . 否:双括号是因为你选择使用元组并调用一个带元组的函数需要先创建元组,但是可以有一个带有多个参数的闭包,比如F: Fn(i32, i32) That is, one could write that function as: 也就是说,可以将该函数编写为:

fn iterNeighbors<F>(i: i32, j: i32, f: F)
where
    F: Fn(i32, i32),
{
    f(i - 1, j);
    f(i + 1, j);
    f(i, j - 1);
    f(i, j + 1);
}

However, it seems that retaining the tuples makes sense for this case. 但是,似乎保留元组对于这种情况是有意义的。

I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. 我想我正在努力让参数的生命周期正确,因为它们在递归调用中被旋转。 How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call? 如果我希望s2在返回之前被释放并且我希望s1在返回时或者在递归调用中存活时,我应该如何注释生命周期?

No need for references (and hence no need for lifetimes), just pass the data through directly: 不需要引用(因此不需要生命周期),只需直接传递数据:

fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
    if n == 0 {
        return s1;
    } else {
        let mut s0 = HashSet::new();
        for &p in &s1 {
            iterNeighbors(p, |p| {
                if !(s1.contains(&p) || s2.contains(&p)) {
                    s0.insert(p);
                }
            })
        }
        drop(s2); // guarantees timely deallocation
        return nthLoop(n - 1, s0, s1);
    }
}

The key here is you can do everything by value, and things passed around by value will of course keep their values around. 这里的关键是你可以按价值做所有事情,价值传递的东西当然会保持他们的价值观。

However, this fails to compile: 但是,这无法编译:

error[E0387]: cannot borrow data mutably in a captured outer variable in an `Fn` closure
  --> src/main.rs:21:21
   |
21 |                     s0.insert(p);
   |                     ^^
   |
help: consider changing this closure to take self by mutable reference
  --> src/main.rs:19:30
   |
19 |               iterNeighbors(p, |p| {
   |  ______________________________^
20 | |                 if !(s1.contains(&p) || s2.contains(&p)) {
21 | |                     s0.insert(p);
22 | |                 }
23 | |             })
   | |_____________^

That is to say, the closure is trying to mutate values it captures ( s0 ), but the Fn closure trait doesn't allow this. 也就是说,闭包试图改变它捕获的值( s0 ),但Fn闭包特征不允许这样做。 That trait can be called in a more flexible manner (when shared), but this imposes more restrictions on what the closure can do internally. 可以以更灵活的方式(共享时)调用该特征,但是这会对封闭在内部执行的操作施加更多限制。 (If you're interested, I've written more about this ) (如果你有兴趣, 我已经写了更多关于这个

Fortunately there's an easy fix: using the FnMut trait, which requires that the closure can only be called when one has unique access to it, but allows the internals to mutate things. 幸运的是,有一个简单的解决方法:使用FnMut trait,这要求只有在对其具有唯一访问权限时才能调用闭包,但允许内部变异。

fn iterNeighbors<F>((i, j): (i32, i32), mut f: F)
where
    F: FnMut((i32, i32)),
{
    f((i - 1, j));
    f((i + 1, j));
    f((i, j - 1));
    f((i, j + 1));
}

The caller would look something like this: 调用者看起来像这样:

Values work here too: returning a reference in that case would be returning a pointer to s0 , which lives the stack frame that is being destroyed as the function returns. 值也适用于此:在这种情况下返回一个引用将返回一个指向s0的指针,该指针生成在函数返回时被销毁的堆栈帧。 That is, the reference is pointing to dead data. 也就是说,引用指向死数据。

The fix is just not using references: 该修复程序不使用引用:

fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
    let s0 = HashSet::new();
    let mut s1 = HashSet::new();
    s1.insert(p);
    return nthLoop(n, s1, s0);
}

This works if I inline the closure by hand but I cannot figure out how to invoke the closure. 如果我手动内联闭包,但是我无法弄清楚如何调用闭包,这是有效的。 Ideally, I'd like static dispatch here. 理想情况下,我想在这里静态发送。

(I don't understand what this means, including the compiler error messages you're having trouble with helps us help you.) (我不明白这意味着什么,包括你遇到麻烦的编译错误信息可以帮助我们。)

Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). 另外,我只在F#中使用了HashSet,因为我认为Rust不提供具有有效集合理论操作(并集,交集和差异)的纯函数集。 Am I correct in assuming that? 我认为这是正确的吗?

Depending on exactly what you want, no, eg both HashSet and BTreeSet provide various set-theoretic operations as methods which return iterators . 根据您的需要,不会,例如, HashSetBTreeSet提供各种集合理论操作作为返回迭代器的方法


Some small points: 一些小点:

  • explicit/named lifetimes allow the compiler to reason about the static validity of data, they don't control it (ie they allow the compiler to point out when you do something wrong, but language still has the same sort of static resource usage/life-cycle guarantees as C++) 显式/命名生命周期允许编译器推断数据的静态有效性,它们不控制它(即它们允许编译器在你做错的时候指出,但是语言仍然具有相同类型的静态资源使用/生命循环保证为C ++)
  • the version with a loop is likely to be more efficient as written, as it reuses memory directly (swapping the sets, plus the s0.clear() , however, the same benefit can be realised with a recursive version by passing s2 down for reuse instead of dropping it. 带有循环的版本可能在编写时更有效,因为它直接重用内存(交换集合,加上s0.clear() ,但是,通过将s2向下传递以便重用,可以通过递归版本实现相同的好处而不是放弃它。
  • the while loop could be for _ in 0..n while循环可以是for _ in 0..n
  • there's no need to pass closures by reference, but with or without the reference, there's still static dispatch (the closure is a type parameter, not a trait object). 没有必要通过引用传递闭包,但无论有没有引用,仍然有静态分派(闭包是一个类型参数,而不是特征对象)。
  • conventionally, closure arguments are last, and not taken by reference, because it makes defining & passing them inline easier to read (eg foo(x, |y| bar(y + 1)) instead of foo(&|y| bar(y + 1), x) ) 通常,闭包参数是最后的,而不是引用,因为它使得内联定义和传递更容易阅读(例如foo(x, |y| bar(y + 1))而不是foo(&|y| bar(y + 1), x)
  • the return keyword isn't necessary for trailing returns (if the ; is omitted): 尾随返回不需要return关键字(如果省略; ):

     fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> { let s0 = HashSet::new(); let mut s1 = HashSet::new(); s1.insert(p); nthLoop(n, s1, s0) } 

I think that is a function that accepts a closure (that itself accepts a pair and returns unit) and a pair and returns unit. 我认为这是一个接受闭包的函数(它本身接受一对并返回单位)和一对并返回单位。 I seem to have to double bracket things: is that correct? 我似乎必须加倍支架:这是正确的吗?

You need the double brackets because you're passing a 2-tuple to the closure, which matches your original F# code. 你需要双括号,因为你将2元组传递给闭包,它与原始的F#代码相匹配。

I think I'm struggling to get the lifetimes of the arguments correct because they are rotated in the recursive call. 我想我正在努力让参数的生命周期正确,因为它们在递归调用中被旋转。 How should I annotate the lifetimes if I want s2 to be deallocated just before the returns and I want s1 to survive either when returned or into the recursive call? 如果我希望s2在返回之前被释放并且我希望s1在返回时或者在递归调用中存活时,我应该如何注释生命周期?

The problem is that you're using references to HashSet s when you should just use HashSet s directly. 问题是当你应该直接使用HashSet时,你正在使用对HashSet的引用。 Your signature for nthLoop is already correct; 您对nthLoop签名已经是正确的; you just need to remove a few occurrences of & . 你只需要删除一些&

To deallocate s2 , you can write drop(s2) . 要解除分配s2 ,可以写drop(s2) Note that Rust doesn't have guaranteed tail calls, so each recursive call will still take a bit of stack space (you can see how much with the mem::size_of function), but the drop call will purge the data on the heap. 请注意,Rust没有保证尾调用,因此每次递归调用仍然会占用一些堆栈空间(您可以看到mem::size_of函数有多少),但drop调用将清除堆上的数据。

The caller would look something like this: 调用者看起来像这样:

Again, you just need to remove the & 's here. 同样,你只需要删除这里的&

Note that I haven't even bothered with the call to iterNeighbors yet. 请注意,我还没有打扰到iterNeighbors的调用。


This works if I inline the closure by hand but I cannot figure out how to invoke the closure. 如果我手动内联闭包,但是我无法弄清楚如何调用闭包,这是有效的。 Ideally, I'd like static dispatch here. 理想情况下,我想在这里静态发送。

There are three types of closures in Rust: Fn , FnMut and FnOnce . Rust中有三种类型的闭包: FnFnMutFnOnce They differ by the type of their self argument. 他们的self论证的类型不同。 The distinction is important because it puts restrictions on what the closure is allowed to do and on how the caller can use the closure. 区别很重要,因为它限制了允许关闭的内容以及调用者如何使用闭包。 The Rust book has a chapter on closures that already explains this well. Rust书中有一章关于闭包 ,已经很好地解释了这一点。

Your closure needs to mutate s0 . 你的闭包需要改变s0 However, iterNeighbors is defined as expecting an Fn closure. 但是, iterNeighbors定义为期望Fn闭包。 Your closure cannot implement Fn because Fn receives &self , but to mutate s0 , you need &mut self . 你的闭包不能实现Fn因为Fn接收&self ,但要改变s0 ,你需要&mut self iterNeighbors cannot use FnOnce , since it needs to call the closure more than once. iterNeighbors不能使用FnOnce ,因为它需要不止一次调用闭包。 Therefore, you need to use FnMut . 因此,您需要使用FnMut

Also, it's not necessary to pass the closure by reference to iterNeighbors . 此外,没有必要通过引用iterNeighbors来传递闭包。 You can just pass it by value; 你可以按值传递它; each call to the closure will only borrow the closure, not consume it. 每次对闭包的调用都只会借用闭包,而不是消耗它。

Also, I only used HashSet in the F# because I assume Rust doesn't provide a purely functional Set with efficient set-theoretic operations (union, intersection and difference). 另外,我只在F#中使用了HashSet,因为我认为Rust不提供具有有效集合理论操作(并集,交集和差异)的纯函数集。 Am I correct in assuming that? 我认为这是正确的吗?

There's no purely functional set implementation in the standard library (maybe there's one on crates.io ?). 标准库中没有纯粹的功能集实现(也许在crates.io上有一个?)。 While Rust embraces functional programming, it also takes advantage of its ownership and borrowing system to make imperative programming safer. 虽然Rust包含函数式编程,但它还利用其所有权和借用系统来使命令式编程更安全。 A functional set would probably impose using some form of reference counting or garbage collection in order to share items across sets. 函数集可能会使用某种形式的引用计数或垃圾回收来强制使用集合来共享项目。

However, HashSet does implement set-theoretic operations. 但是, HashSet确实实现了集合理论操作。 There are two ways to use them: iterators ( difference , symmetric_difference , intersection , union ), which generate the sequence lazily, or operators ( | , & , ^ , - , as listed in the trait implementations for HashSet ), which produce new sets containing clones of the values from the source sets. 有两种方法可以使用它们:迭代器( differencesymmetric_differenceintersectionunion ),它们生成延迟的序列,或者运算符( |&^- ,如HashSettrait实现中所列),它们产生新的集合包含源集中值的克隆。


Here's the working code: 这是工作代码:

use std::collections::HashSet;

fn iterNeighbors<F>(mut f: F, (i, j): (i32, i32)) -> ()
where
    F: FnMut((i32, i32)) -> (),
{
    f((i - 1, j));
    f((i + 1, j));
    f((i, j - 1));
    f((i, j + 1));
}

fn nthLoop(n: i32, s1: HashSet<(i32, i32)>, s2: HashSet<(i32, i32)>) -> HashSet<(i32, i32)> {
    if n == 0 {
        return s1;
    } else {
        let mut s0 = HashSet::new();
        for &p in &s1 {
            let add = |p| {
                if !(s1.contains(&p) || s2.contains(&p)) {
                    s0.insert(p);
                }
            };
            iterNeighbors(add, p);
        }
        drop(s2);
        return nthLoop(n - 1, s0, s1);
    }
}

fn nth(n: i32, p: (i32, i32)) -> HashSet<(i32, i32)> {
    let mut s1 = HashSet::new();
    s1.insert(p);
    let s2 = HashSet::new();
    return nthLoop(n, s1, s2);
}

fn main() {
    let s = nth(2000, (0, 0));
    println!("{}", s.len());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM