简体   繁体   English

为什么 Rust 不允许可变别名?

[英]Why does Rust disallow mutable aliasing?

Rust disallows this kind of code because it is unsafe: Rust 不允许这种代码,因为它不安全:

fn main() {
    let mut i = 42;
    let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
    let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };

    *ref_to_i_1 = 1;
    *ref_to_i_2 = 2;
}

How can I do do something bad ( eg segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?如何使用对同一事物的多个可变引用来做坏事(例如分段错误、未定义的行为等)?

The only possible issues I can see come from the lifetime of the data.我能看到的唯一可能的问题来自数据的生命周期。 Here, if i is alive, each mutable reference to it should be ok.在这里,如果i还活着,对它的每个可变引用都应该没问题。

I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?我可以看到引入线程时可能会出现问题,但是为什么即使我在一个线程中完成所有事情也会阻止它?

A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:在 C++ 程序中,甚至在 Java 程序中,一个真正常见的缺陷是在迭代集合时修改集合,如下所示:

for (it: collection) {
    if (predicate(*it)) {
        collection.remove(it);
    }
}

For C++ standard library collections, this causes undefined behaviour.对于 C++ 标准库集合,这会导致未定义的行为。 Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array.也许迭代会一直工作到你到达最后一个条目,但最后一个条目将取消引用一个悬空指针或读取数组的末尾。 Maybe the whole array underlying the collection will be relocated, and it'll fail immediately.也许集合下的整个数组将被重新定位,它会立即失败。 Maybe it works most of the time but fails if a reallocation happens at the wrong time.也许它大部分时间都有效,但如果重新分配发生在错误的时间,它就会失败。 In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct.在大多数 Java 标准集合中,根据语言规范,它也是未定义的行为,但集合倾向于抛出ConcurrentModificationException - 即使您的代码正确,这种检查也会导致运行时成本。 Neither language can detect the error during compilation.两种语言都无法在编译期间检测到错误。

This is a common example of a data race caused by concurrency, even in a single-threaded environment.这是由并发引起的数据竞争的常见示例,即使在单线程环境中也是如此。 Concurrency doesn't just mean parallelism: it can also mean nested computation.并发不仅仅意味着并行:它还意味着嵌套计算。 In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.在 Rust 中,这种错误会在编译过程中被检测到,因为迭代器有一个不可变的集合借用,因此您不能在迭代器处于活动状态时对集合进行变异。

An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function.一个更容易理解但不太常见的示例是当您将多个指针(或引用)传递给函数时的指针别名。 A concrete example would be passing overlapping memory ranges to memcpy instead of memmove .一个具体的例子是将重叠的内存范围传递给memcpy而不是memmove Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references.实际上, Rust 的memcpy等价物也是unsafe的,但那是因为它使用指针而不是引用。 The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.链接页面显示了如何使用可变引用从不别名的保证来创建安全交换功能。

A more contrived example of reference aliasing is like this:参考别名的一个更人为的例子是这样的:

int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined

You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.你不能在 Rust 中编写这样的函数调用,因为你必须对同一个变量进行两次可变借用。

How can I do do something bad (eg segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?如何使用对同一事物的多个可变引用来做坏事(例如分段错误、未定义的行为等)?

I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler for &mut references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way.我相信,尽管您通过这样做触发了“未定义的行为”,但从技术上讲,Rust 编译器不会将noalias标志用于&mut引用,所以实际上,现在,您可能实际上无法以这种方式触发未定义的行为。 What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.您触发的是“实现特定行为”,即“根据 LLVM 的行为类似于 C++”。

See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?请参阅为什么假设两个可变引用不能别名,Rust 编译器不优化代码? for more information.想要查询更多的信息。

I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?我可以看到引入线程时可能会出现问题,但是为什么即使我在一个线程中完成所有事情也会阻止它?

Have a read of this series of blog articles about undefined behavior阅读有关未定义行为的这一系列博客文章

In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about;在我看来,竞争条件(如迭代器)并不是你所说的一个很好的例子。 in a single threaded environment you can avoid that sort of problem if you're careful.在单线程环境中,如果你小心的话,你可以避免这种问题。 This is no different to creating an arbitrary pointer to invalid memory and writing to it;这与创建指向无效内存的任意指针并写入它没有什么不同; just don't do it.只是不要这样做。 You're no worse off than using C.你并不比使用 C 更糟糕。

To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed;要了解这里的问题,请考虑在发布模式下编译时,编译器可能会或可能不会在执行优化时重新排序语句; that means that although your code may run in the linear sequence:这意味着尽管您的代码可能以线性顺序运行:

a; b; c;

There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence.无法保证编译器在运行时会按该顺序执行它们,如果(根据编译器所知道的),没有逻辑理由必须以特定的原子顺序执行语句。 Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.我上面链接的博客的第 3 部分演示了这如何导致未定义的行为。

tl;dr : Basically, the compiler may perform various optimizations; tl;dr :基本上,编译器可以执行各种优化; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.当且仅当您的程序不触发未定义的行为时,这些保证会继续使您的程序以确定性方式运行。

As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future.据我所知,Rust 编译器目前没有使用许多可能导致这种故障的“高级优化”,但不能保证将来不会。 It is not a 'breaking change' to introduce new compiler optimizations.引入新的编译器优化并不是一个“重大变化”。

So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now;所以......实际上,您现在不太可能仅通过可变别名触发实际的未定义行为; but the restriction allows the possibility of future performance optimizations.但限制允许未来性能优化的可能性。

Pertinent quote:相关报价:

The C FAQ defines “undefined behavior” like this: C FAQ 定义了“未定义的行为”,如下所示:

Anything at all can happen;任何事情都可能发生; the Standard imposes no requirements.该标准没有规定任何要求。 The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.该程序可能无法编译,或者它可能执行不正确(崩溃或默默地生成不正确的结果),或者它可能偶然地完全按照程序员的意图执行。

Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?作者注:以下答案最初是针对交织范围如何创建“数据竞争”?

The compiler is allowed to optimize &mut pointers under the assumption that they are exclusive (not aliased).允许编译器优化&mut指针,假设它们是独占的(不是别名的)。 Your code breaks this assumption.您的代码打破了这个假设。

The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1 and ref_to_i_2 to a function that modifies both and then does something with them:问题中的示例有点过于琐碎,无法展示任何有趣的错误行为,但请考虑将ref_to_i_1ref_to_i_2传递给修改两者然后对它们执行某些操作的函数:

fn main() {
    let mut i = 42;
    let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
    let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };

    foo(ref_to_i_1, ref_to_i_2);
}

fn foo(r1: &mut i32, r2: &mut i32) {
    *r1 = 1;
    *r2 = 2;
    println!("{}", r1);
    println!("{}", r2);
}

The compiler may (or may not) decide to de-interleave the accesses to r1 and r2 , because they are not allowed to alias:编译器可能(或可能不)决定去交错对r1r2的访问,因为它们不允许别名:

// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code. 
fn foo(r1: &mut i32, r2: &mut i32) {
    *r1 = 1;
    println!("{}", r1);
    *r2 = 2;
    println!("{}", r2);
}

It might even realize that the println!它甚至可能意识到println! s always print the same value and take advantage of that fact to further rearrange foo : s 总是打印相同的值,并利用这一事实进一步重新排列foo

fn foo(r1: &mut i32, r2: &mut i32) {
    println!("{}", 1);
    println!("{}", 2);
    *r1 = 1;
    *r2 = 2;
}

It's good that a compiler can do this optimization!编译器可以进行这种优化,这很好! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). (即使 Rust 目前没有,正如Doug 的回答提到的那样。)优化编译器很棒,因为它们可以使用上述转换来使代码运行得更快(例如,通过更好地通过 CPU 流水线化代码,或者通过启用编译器在以后的通道中进行更积极的优化)。 All else being equal, everybody likes their code to run fast, right?在其他条件相同的情况下,每个人都喜欢他们的代码运行得快,对吧?

You might say "Well, that's an invalid optimization because it doesn't do the same thing."你可能会说“嗯,这是一个无效的优化,因为它没有做同样的事情。” But you'd be wrong: the whole point of &mut references is that they do not alias.但你错了: &mut引用的全部意义在于它们没有别名。 There is no way to make r1 and r2 alias without breaking the rules†, which is what makes this optimization valid to do.在不违反规则的情况下,无法使r1r2成为别名†,这就是使此优化有效的原因。

You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples.您可能还认为这是一个只出现在更复杂的代码中的问题,因此编译器应该允许简单的示例。 But bear in mind that these transformations are part of a long multi-step optimization process.但请记住,这些转换是漫长的多步骤优化过程的一部分。 It's important to uphold the properties of &mut references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.在任何地方都维护&mut引用的属性很重要,这样编译器就可以对一段代码进行微小的优化,而无需理解所有代码。

One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem;还要考虑一件事:作为程序员,为您的问题选择和应用适当的类型是您的工作; asking the compiler for occasional exceptions to the &mut aliasing rule is basically asking it to do your job for you.向编译器询问&mut别名规则的偶尔例外情况基本上是要求它为您完成工作。

If you want shared mutability and to forego those optimizations, it's simple: don't use &mut .如果您想要共享可变性并放弃这些优化,这很简单:不要使用&mut In the example, you can use &Cell<i32> instead of &mut i32 , as the comments mentioned:在示例中,您可以使用&Cell<i32>代替&mut i32 ,如评论所述:

fn main() {
    let mut i = std::cell::Cell::new(42);
    let ref_to_i_1 = &i;
    let ref_to_i_2 = &i;

    foo(ref_to_i_1, ref_to_i_2);
}

fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
    r1.set(1);
    r2.set(2);
    println!("{}", r1.get()); // prints 2, guaranteed
    println!("{}", r2.get()); // also prints 2
}

The types in std::cell provide interior mutability , which is jargon for "disallow certain optimizations because & references may mutate things". std::cell中的类型提供内部可变性,这是“不允许某些优化,因为&引用可能会改变事物”的行话。 They aren't always quite as convenient as using &mut , but that's because using them gives you more flexibility to write code like the above.它们并不总是像使用&mut那样方便,但那是因为使用它们可以让您更灵活地编写上述代码。

Also read另请阅读

  • The Problem With Single-threaded Shared Mutability describes how having multiple mutable references can cause soundness issues even in the absence of multiple threads and data races. 单线程共享可变性的问题描述了即使在没有多线程和数据竞争的情况下,拥有多个可变引用也会导致可靠性问题。
  • Dan Hulme's answer illustrates how aliased mutation of more complex data can also cause undefined behavior (even before compiler optimizations). Dan Hulme 的回答说明了更复杂数据的别名突变如何也会导致未定义的行为(甚至在编译器优化之前)。

† Be aware that using unsafe by itself does not count as "breaking the rules". † 请注意,单独使用unsafe并不能算作“违反规则”。 &mut references cannot be aliased, even when using unsafe , in order for your code to have defined behavior. &mut引用不能使用别名,即使在使用unsafe时也是如此,以使您的代码具有已定义的行为。

The simplest example I know of is trying to push into a Vec that's borrowed:我知道的最简单的例子是试图push入一个借来的Vec

let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);

This is a compiler error:这是一个编译器错误:

error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
 --> src/main.rs:4:5
  |
3 |     let c = &v[0];
  |              - immutable borrow occurs here
4 |     v.push('b');
  |     ^^^^^^^^^^^ mutable borrow occurs here
5 |     dbg!(c);
  |          - immutable borrow later used here

It's good that this is a compiler error, because otherwise it would be a use-after-free.这是一个编译器错误很好,因为否则它将是一个use-after-free。 push reallocates the Vec 's heap storage and invalidates our c reference. push重新分配Vec的堆存储并使我们的c引用无效。 Rust doesn't actually know what push does; Rust 实际上并不知道push是做什么的。 all Rust knows is that push takes &mut self , and here that violates the aliasing rule. Rust 所知道的是push需要&mut self ,这违反了别名规则。

Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this.许多其他未定义行为的单线程示例涉及像这样销毁堆上的对象。 But if we play around a bit with references and enums, we can express something similar without heap allocation:但是如果我们稍微玩一下引用和枚举,我们可以在没有堆分配的情况下表达类似的东西:

enum MyEnum<'a> {
    Ptr(&'a i32),
    Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
    MyEnum::Ptr(i) => i,
    MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);

Here we've taken a pointer to my_int , stored that pointer in my_enum , and made my_int_ptr_ptr point into my_enum .在这里,我们获取了指向my_int的指针,将该指针存储在my_enum中,并使my_int_ptr_ptr指向my_enum If we could then reassign my_enum , we could trash the bits that my_int_ptr_ptr was pointing to.如果我们可以重新分配my_enum ,我们可以my_int_ptr_ptr指向的位。 A double dereference of my_int_ptr_ptr would be a wild pointer read, which would probably segfault. my_int_ptr_ptr的双重取消引用将是一个野指针读取,这可能会出现段错误。 Luckily, this it another violation of the aliasing rule, and it won't compile:幸运的是,这又一次违反了别名规则,它不会编译:

error[E0506]: cannot assign to `my_enum` because it is borrowed
  --> src/main.rs:12:1
   |
8  | let my_int_ptr_ptr: &&i32 = match &my_enum {
   |                                   -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
   |      ---------------- borrow later used here

术语“混叠”通常用于标识更改涉及不同引用的操作顺序会改变这些操作的效果的情况。如果对一个对象的多个引用存储在不同的位置,但在这些引用的生命周期内没有修改该对象,则编译器可以使用这些引用有效地提升、延迟或合并操作,而不会影响程序行为。

例如,如果编译器看到代码读取x<\/code>引用的对象的内容,然后对y<\/code>引用的对象执行某些操作,并再次读取x<\/code>引用的对象的内容,并且如果编译器知道该操作y<\/code>不能修改x<\/code>引用的对象,编译器可能会将x<\/code>两个读取合并为一个读取。

如果程序员可以无限自由地使用和存储他们认为合适的引用,那么在所有情况下确定对一个引用的操作是否会影响另一个引用将是一个棘手的问题。然而,Rust 试图处理两种简单的情况:

  1. 如果一个对象在引用的生命周期内永远不会被修改,那么使用引用的机器代码将不必担心在该生命周期内哪些操作可能会改变它,因为任何操作都不可能这样做。

    <\/li>

  2. 如果在引用的生命周期内,一个对象只会被明显基于该引用的引用修改,使用该引用的机器代码不必担心使用该引用的任何操作是否会与涉及出现的引用的操作交互不相关,因为没有看似不相关的引用将标识同一个对象。

    <\/li><\/ol>

    允许可变引用之间存在别名的可能性会使事情变得更加复杂,因为许多可以与对可变对象的非共享引用或对不可变对象的可共享引用互换执行的优化不能再这样做了。一旦一种语言支持涉及看似独立的引用的操作需要以精确排序的方式处理的情况,编译器就很难知道何时需要这种精确的排序。

    "

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么通过原始指针修改可变引用的值不违反Rust的别名规则? - Why does modifying a mutable reference's value through a raw pointer not violate Rust's aliasing rules? 为什么 Rust 允许对可变变量的不可变引用? - Why Does Rust Allow an Immutable Reference to a Mutable Variable? 为什么 Rust 在已经持有可变借用时再次尝试借用 *self 作为可变借用? - Why does Rust try to borrow *self as mutable again, when it already holds a mutable borrow? 为什么即使第一个可变借用已经超出 scope,借用检查器也不允许第二个可变借用? - Why does the borrow checker disallow a second mutable borrow even if the first one is already out of scope? 为什么 Rust 不允许“让 v = Vec<i32> ::新的();”? - Why does Rust disallow "let v = Vec<i32>::new();"? 为什么 Rust 会阻止多个可变引用? - Why Rust prevents from multiple mutable references? 为什么 rust 在引用可变变量时重新声明可变性? - Why does rust re-declare mutability when taking a reference to a mutable variable? 为什么Rust想要一次多次使用变量作为可变变量? - Why does Rust want to borrow a variable as mutable more than once at a time? 为什么 Rust 可变借用发生在这里? - Why Rust mutable borrow occurs here? 为什么使用Rust将可变结构传递给函数会导致字段不可变? - Why using Rust does passing a mutable struct to a function result in immutable fields?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM