[英]Why does Rust disallow mutable aliasing?
Rust disallows this kind of code because it is unsafe: Rust 不允许这种代码,因为它不安全:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
*ref_to_i_1 = 1;
*ref_to_i_2 = 2;
}
How can I do do something bad ( eg segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?如何使用对同一事物的多个可变引用来做坏事(例如分段错误、未定义的行为等)?
The only possible issues I can see come from the lifetime of the data.我能看到的唯一可能的问题来自数据的生命周期。 Here, if i
is alive, each mutable reference to it should be ok.在这里,如果i
还活着,对它的每个可变引用都应该没问题。
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?我可以看到引入线程时可能会出现问题,但是为什么即使我在一个线程中完成所有事情也会阻止它?
A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:在 C++ 程序中,甚至在 Java 程序中,一个真正常见的缺陷是在迭代集合时修改集合,如下所示:
for (it: collection) {
if (predicate(*it)) {
collection.remove(it);
}
}
For C++ standard library collections, this causes undefined behaviour.对于 C++ 标准库集合,这会导致未定义的行为。 Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array.也许迭代会一直工作到你到达最后一个条目,但最后一个条目将取消引用一个悬空指针或读取数组的末尾。 Maybe the whole array underlying the collection will be relocated, and it'll fail immediately.也许集合下的整个数组将被重新定位,它会立即失败。 Maybe it works most of the time but fails if a reallocation happens at the wrong time.也许它大部分时间都有效,但如果重新分配发生在错误的时间,它就会失败。 In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException
- a check which causes a runtime cost even when your code is correct.在大多数 Java 标准集合中,根据语言规范,它也是未定义的行为,但集合倾向于抛出ConcurrentModificationException
- 即使您的代码正确,这种检查也会导致运行时成本。 Neither language can detect the error during compilation.两种语言都无法在编译期间检测到错误。
This is a common example of a data race caused by concurrency, even in a single-threaded environment.这是由并发引起的数据竞争的常见示例,即使在单线程环境中也是如此。 Concurrency doesn't just mean parallelism: it can also mean nested computation.并发不仅仅意味着并行:它还意味着嵌套计算。 In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.在 Rust 中,这种错误会在编译过程中被检测到,因为迭代器有一个不可变的集合借用,因此您不能在迭代器处于活动状态时对集合进行变异。
An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function.一个更容易理解但不太常见的示例是当您将多个指针(或引用)传递给函数时的指针别名。 A concrete example would be passing overlapping memory ranges to memcpy
instead of memmove
.一个具体的例子是将重叠的内存范围传递给memcpy
而不是memmove
。 Actually, Rust's memcpy
equivalent is unsafe
too, but that's because it takes pointers instead of references.实际上, Rust 的memcpy
等价物也是unsafe
的,但那是因为它使用指针而不是引用。 The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.链接页面显示了如何使用可变引用从不别名的保证来创建安全交换功能。
A more contrived example of reference aliasing is like this:参考别名的一个更人为的例子是这样的:
int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined
You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.你不能在 Rust 中编写这样的函数调用,因为你必须对同一个变量进行两次可变借用。
How can I do do something bad (eg segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?如何使用对同一事物的多个可变引用来做坏事(例如分段错误、未定义的行为等)?
I believe that although you trigger 'undefined behavior' by doing this, technically the noalias
flag is not used by the Rust compiler for &mut
references, so practically speaking, right now, you probably can't actually trigger undefined behavior this way.我相信,尽管您通过这样做触发了“未定义的行为”,但从技术上讲,Rust 编译器不会将noalias
标志用于&mut
引用,所以实际上,现在,您可能实际上无法以这种方式触发未定义的行为。 What you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.您触发的是“实现特定行为”,即“根据 LLVM 的行为类似于 C++”。
See Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?请参阅为什么假设两个可变引用不能别名,Rust 编译器不优化代码? for more information.想要查询更多的信息。
I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?我可以看到引入线程时可能会出现问题,但是为什么即使我在一个线程中完成所有事情也会阻止它?
Have a read of this series of blog articles about undefined behavior阅读有关未定义行为的这一系列博客文章
In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about;在我看来,竞争条件(如迭代器)并不是你所说的一个很好的例子。 in a single threaded environment you can avoid that sort of problem if you're careful.在单线程环境中,如果你小心的话,你可以避免这种问题。 This is no different to creating an arbitrary pointer to invalid memory and writing to it;这与创建指向无效内存的任意指针并写入它没有什么不同; just don't do it.只是不要这样做。 You're no worse off than using C.你并不比使用 C 更糟糕。
To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed;要了解这里的问题,请考虑在发布模式下编译时,编译器可能会或可能不会在执行优化时重新排序语句; that means that although your code may run in the linear sequence:这意味着尽管您的代码可能以线性顺序运行:
a; b; c;
There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence.无法保证编译器在运行时会按该顺序执行它们,如果(根据编译器所知道的),没有逻辑理由必须以特定的原子顺序执行语句。 Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.我上面链接的博客的第 3 部分演示了这如何导致未定义的行为。
tl;dr : Basically, the compiler may perform various optimizations; tl;dr :基本上,编译器可以执行各种优化; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.当且仅当您的程序不触发未定义的行为时,这些保证会继续使您的程序以确定性方式运行。
As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future.据我所知,Rust 编译器目前没有使用许多可能导致这种故障的“高级优化”,但不能保证将来不会。 It is not a 'breaking change' to introduce new compiler optimizations.引入新的编译器优化并不是一个“重大变化”。
So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now;所以......实际上,您现在不太可能仅通过可变别名触发实际的未定义行为; but the restriction allows the possibility of future performance optimizations.但限制允许未来性能优化的可能性。
Pertinent quote:相关报价:
The C FAQ defines “undefined behavior” like this: C FAQ 定义了“未定义的行为”,如下所示:
Anything at all can happen;任何事情都可能发生; the Standard imposes no requirements.该标准没有规定任何要求。 The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.该程序可能无法编译,或者它可能执行不正确(崩溃或默默地生成不正确的结果),或者它可能偶然地完全按照程序员的意图执行。
Author's Note: The following answer was originally written for How do intertwined scopes create a "data race"?作者注:以下答案最初是针对交织范围如何创建“数据竞争”?
The compiler is allowed to optimize &mut
pointers under the assumption that they are exclusive (not aliased).允许编译器优化&mut
指针,假设它们是独占的(不是别名的)。 Your code breaks this assumption.您的代码打破了这个假设。
The example in the question is a little too trivial to exhibit any kind of interesting wrong behavior, but consider passing ref_to_i_1
and ref_to_i_2
to a function that modifies both and then does something with them:问题中的示例有点过于琐碎,无法展示任何有趣的错误行为,但请考虑将ref_to_i_1
和ref_to_i_2
传递给修改两者然后对它们执行某些操作的函数:
fn main() {
let mut i = 42;
let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
*r2 = 2;
println!("{}", r1);
println!("{}", r2);
}
The compiler may (or may not) decide to de-interleave the accesses to r1
and r2
, because they are not allowed to alias:编译器可能(或可能不)决定去交错对r1
和r2
的访问,因为它们不允许别名:
// The following is an illustration of how the compiler might rearrange
// side effects in a function to optimize it. Optimization passes in the
// compiler actually work on (MIR and) LLVM IR, not on raw Rust code.
fn foo(r1: &mut i32, r2: &mut i32) {
*r1 = 1;
println!("{}", r1);
*r2 = 2;
println!("{}", r2);
}
It might even realize that the println!
它甚至可能意识到println!
s always print the same value and take advantage of that fact to further rearrange foo
: s 总是打印相同的值,并利用这一事实进一步重新排列foo
:
fn foo(r1: &mut i32, r2: &mut i32) {
println!("{}", 1);
println!("{}", 2);
*r1 = 1;
*r2 = 2;
}
It's good that a compiler can do this optimization!编译器可以进行这种优化,这很好! (Even if Rust's currently doesn't, as Doug's answer mentions.) Optimizing compilers are great because they can use transformations like those above to make code run faster (for instance, by better pipelining the code through the CPU, or by enabling the compiler to do more aggressive optimizations in a later pass). (即使 Rust 目前没有,正如Doug 的回答提到的那样。)优化编译器很棒,因为它们可以使用上述转换来使代码运行得更快(例如,通过更好地通过 CPU 流水线化代码,或者通过启用编译器在以后的通道中进行更积极的优化)。 All else being equal, everybody likes their code to run fast, right?在其他条件相同的情况下,每个人都喜欢他们的代码运行得快,对吧?
You might say "Well, that's an invalid optimization because it doesn't do the same thing."你可能会说“嗯,这是一个无效的优化,因为它没有做同样的事情。” But you'd be wrong: the whole point of &mut
references is that they do not alias.但你错了: &mut
引用的全部意义在于它们没有别名。 There is no way to make r1
and r2
alias without breaking the rules†, which is what makes this optimization valid to do.在不违反规则的情况下,无法使r1
和r2
成为别名†,这就是使此优化有效的原因。
You might also think that this is a problem that only appears in more complicated code, and the compiler should therefore allow the simple examples.您可能还认为这是一个只出现在更复杂的代码中的问题,因此编译器应该允许简单的示例。 But bear in mind that these transformations are part of a long multi-step optimization process.但请记住,这些转换是漫长的多步骤优化过程的一部分。 It's important to uphold the properties of &mut
references everywhere, so that the compiler can make minor optimizations to one section of code without needing to understand all the code.在任何地方都维护&mut
引用的属性很重要,这样编译器就可以对一段代码进行微小的优化,而无需理解所有代码。
One more thing to consider: it is your job as the programmer to choose and apply the appropriate types for your problem;还要考虑一件事:作为程序员,为您的问题选择和应用适当的类型是您的工作; asking the compiler for occasional exceptions to the &mut
aliasing rule is basically asking it to do your job for you.向编译器询问&mut
别名规则的偶尔例外情况基本上是要求它为您完成工作。
If you want shared mutability and to forego those optimizations, it's simple: don't use &mut
.如果您想要共享可变性并放弃这些优化,这很简单:不要使用&mut
。 In the example, you can use &Cell<i32>
instead of &mut i32
, as the comments mentioned:在示例中,您可以使用&Cell<i32>
代替&mut i32
,如评论所述:
fn main() {
let mut i = std::cell::Cell::new(42);
let ref_to_i_1 = &i;
let ref_to_i_2 = &i;
foo(ref_to_i_1, ref_to_i_2);
}
fn foo(r1: &Cell<i32>, r2: &Cell<i32>) {
r1.set(1);
r2.set(2);
println!("{}", r1.get()); // prints 2, guaranteed
println!("{}", r2.get()); // also prints 2
}
The types in std::cell
provide interior mutability , which is jargon for "disallow certain optimizations because &
references may mutate things". std::cell
中的类型提供内部可变性,这是“不允许某些优化,因为&
引用可能会改变事物”的行话。 They aren't always quite as convenient as using &mut
, but that's because using them gives you more flexibility to write code like the above.它们并不总是像使用&mut
那样方便,但那是因为使用它们可以让您更灵活地编写上述代码。
† Be aware that using unsafe
by itself does not count as "breaking the rules". † 请注意,单独使用unsafe
并不能算作“违反规则”。 &mut
references cannot be aliased, even when using unsafe
, in order for your code to have defined behavior. &mut
引用不能使用别名,即使在使用unsafe
时也是如此,以使您的代码具有已定义的行为。
The simplest example I know of is trying to push
into a Vec
that's borrowed:我知道的最简单的例子是试图push
入一个借来的Vec
:
let mut v = vec!['a'];
let c = &v[0];
v.push('b');
dbg!(c);
This is a compiler error:这是一个编译器错误:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> src/main.rs:4:5
|
3 | let c = &v[0];
| - immutable borrow occurs here
4 | v.push('b');
| ^^^^^^^^^^^ mutable borrow occurs here
5 | dbg!(c);
| - immutable borrow later used here
It's good that this is a compiler error, because otherwise it would be a use-after-free.这是一个编译器错误很好,因为否则它将是一个use-after-free。 push
reallocates the Vec
's heap storage and invalidates our c
reference. push
重新分配Vec
的堆存储并使我们的c
引用无效。 Rust doesn't actually know what push
does; Rust 实际上并不知道push
是做什么的。 all Rust knows is that push
takes &mut self
, and here that violates the aliasing rule. Rust 所知道的是push
需要&mut self
,这违反了别名规则。
Many other single-threaded examples of undefined behavior involve destroying objects on the heap like this.许多其他未定义行为的单线程示例涉及像这样销毁堆上的对象。 But if we play around a bit with references and enums, we can express something similar without heap allocation:但是如果我们稍微玩一下引用和枚举,我们可以在没有堆分配的情况下表达类似的东西:
enum MyEnum<'a> {
Ptr(&'a i32),
Usize(usize),
}
let my_int = 42;
let mut my_enum = MyEnum::Ptr(&my_int);
let my_int_ptr_ptr: &&i32 = match &my_enum {
MyEnum::Ptr(i) => i,
MyEnum::Usize(_) => unreachable!(),
};
my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
dbg!(**my_int_ptr_ptr);
Here we've taken a pointer to my_int
, stored that pointer in my_enum
, and made my_int_ptr_ptr
point into my_enum
.在这里,我们获取了指向my_int
的指针,将该指针存储在my_enum
中,并使my_int_ptr_ptr
指向my_enum
。 If we could then reassign my_enum
, we could trash the bits that my_int_ptr_ptr
was pointing to.如果我们可以重新分配my_enum
,我们可以my_int_ptr_ptr
指向的位。 A double dereference of my_int_ptr_ptr
would be a wild pointer read, which would probably segfault. my_int_ptr_ptr
的双重取消引用将是一个野指针读取,这可能会出现段错误。 Luckily, this it another violation of the aliasing rule, and it won't compile:幸运的是,这又一次违反了别名规则,它不会编译:
error[E0506]: cannot assign to `my_enum` because it is borrowed
--> src/main.rs:12:1
|
8 | let my_int_ptr_ptr: &&i32 = match &my_enum {
| -------- borrow of `my_enum` occurs here
...
12 | my_enum = MyEnum::Usize(0xdeadbeefdeadbeef);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `my_enum` occurs here
13 | dbg!(**my_int_ptr_ptr);
| ---------------- borrow later used here
术语“混叠”通常用于标识更改涉及不同引用的操作顺序会改变这些操作的效果的情况。如果对一个对象的多个引用存储在不同的位置,但在这些引用的生命周期内没有修改该对象,则编译器可以使用这些引用有效地提升、延迟或合并操作,而不会影响程序行为。
例如,如果编译器看到代码读取x<\/code>引用的对象的内容,然后对
y<\/code>引用的对象执行某些操作,并再次读取
x<\/code>引用的对象的内容,并且如果编译器知道该操作
y<\/code>不能修改
x<\/code>引用的对象,编译器可能会将
x<\/code>两个读取合并为一个读取。
如果程序员可以无限自由地使用和存储他们认为合适的引用,那么在所有情况下确定对一个引用的操作是否会影响另一个引用将是一个棘手的问题。然而,Rust 试图处理两种简单的情况:
如果一个对象在引用的生命周期内永远不会被修改,那么使用引用的机器代码将不必担心在该生命周期内哪些操作可能会改变它,因为任何操作都不可能这样做。
<\/li>
如果在引用的生命周期内,一个对象只会被明显基于该引用的引用修改,使用该引用的机器代码不必担心使用该引用的任何操作是否会与涉及出现的引用的操作交互不相关,因为没有看似不相关的引用将标识同一个对象。
<\/li><\/ol>
允许可变引用之间存在别名的可能性会使事情变得更加复杂,因为许多可以与对可变对象的非共享引用或对不可变对象的可共享引用互换执行的优化不能再这样做了。一旦一种语言支持涉及看似独立的引用的操作需要以精确排序的方式处理的情况,编译器就很难知道何时需要这种精确的排序。
"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.