简体   繁体   English

利用内部可变性实现索引

[英]Implementing indexing with interior mutability

Consider, for the sake of simplicity, that I want to implement an indexable Vector v with n consecutive elements 0,1,...,n-1, ie v[i] = i. 为了简单起见,请考虑使用n个连续元素0,1,...,n-1(即v [i] = i)实现可索引向量v。 This vector is supposed to be filled on demand, that is, if v[i] is used and currently the vector contains n < i+1 elements, the values n+1,n+2,...,i are first pushed onto v, and then the reference to v[i] is returned. 该向量应该按需填充,也就是说,如果使用v [i]且当前向量包含n <i + 1个元素,则首先推送值n + 1,n + 2,...,i到v上,然后返回对v [i]的引用。

Code below works fine. 下面的代码工作正常。

struct LazyVector {
    data: Vec<usize>
}

impl LazyVector {
    fn new() -> LazyVector {
        LazyVector{
            data: vec![] 
        }
    }
    fn get(&mut self, i:usize) -> &usize {
        for x in self.data.len()..=i {
            self.data.push(i);
        }
        &self.data[i]
    }
}


pub fn main() {
    let mut v = LazyVector::new();
    println!("v[5]={}",v.get(5)); // prints v[5]=5
}

However, the code above is just a mock-up of the actual structure I'm trying to implement. 但是,上面的代码只是我要实现的实际结构的模型。 In addition to that, (1) I'd like to be able to use the index operator and, (2) although the vector may actually be modified when accessing a position, I'd like that to be transparent to the user, that is, I'd like to be able to index any position even if I had an immutable reference to v. Immutable references are preferred to prevent other unwanted modifications. 除此之外,(1)我希望能够使用索引运算符,并且(2)尽管访问位置时实际上可以修改矢量,但我希望对用户透明,是的,即使我对v有不可变的引用,我也希望能够对任何位置建立索引。不可变的引用是首选,以防止其他不必要的修改。

Requirement (1) could be achieved by implementing the Index trait, like so 可以通过实现Index特质来实现要求(1),就像这样

impl std::ops::Index<usize> for LazyVector {
    type Output = usize;
    fn index(&self, i: usize) -> &Self::Output {
        self.get(i)
    }
}

However, this does not compile since we need a mutable reference in order to be able to call LazyVector::get. 但是,由于我们需要可变的引用才能调用LazyVector :: get,因此无法编译。 Because of requirement (2) we do not want to make this reference mutable, and even if we did, we couldn't do that since it would violate the interface of the Index trait. 由于要求(2),我们不想使该引用可变,即使我们这样做,我们也不能这样做,因为它会违反Index特质的接口。 I figured that this would make the case for the interior mutability pattern through the RefCell smart pointer (as in Chapter 15 of The Rust Book). 我认为这将通过RefCell智能指针为内部可变性模式提供依据(如The Rust Book的第15章)。 So I came up with something like 所以我想出了类似的东西

struct LazyVector {
    data: std::cell::RefCell<Vec<usize>>
}

impl LazyVector {
    fn new() -> LazyVector {
        LazyVector{
            data: std::cell::RefCell::new(vec![]) 
        }
    }

    fn get(&self, i:usize) -> &usize {
        let mut mutref = self.data.borrow_mut();
        for x in mutref.len()..=i {
            mutref.push(x)
        }
        &self.data.borrow()[i] // error: cannot return value referencing a temporary value
    }
}

However this doesn't work because it tries to return a value referencing the Ref struct returned by borrow() that goes out of scope at the end of LazyVector::get. 但是,这是行不通的,因为它试图返回一个值,该值引用LazyVector :: get末尾超出了row()范围的row()返回的Ref结构。 Finally, to circumvent that, I did something like 最后,为了避免这种情况,我做了类似的事情

struct LazyVector {
    data: std::cell::RefCell<Vec<usize>>
}


impl LazyVector {
    fn new() -> LazyVector {
        LazyVector{
            data: std::cell::RefCell::new(vec![]) 
        }
    }

    fn get(&self, i:usize) -> &usize {
        let mut mutref = self.data.borrow_mut();
        for x in mutref.len()..=i {
            mutref.push(x)
        }
        unsafe { // Argh!
            let ptr = self.data.as_ptr();
            &std::ops::Deref::deref(&*ptr)[i]
        }
    }
}


impl std::ops::Index<usize> for LazyVector {
    type Output = usize;
    fn index(&self, i: usize) -> &Self::Output {
        self.get(i)
    }
}

pub fn main() {
    let v = LazyVector::new();    // Unmutable!
    println!("v[5]={}",v.get(5)); // prints v[5]=5
}

Now it works as required but, as a newbie, I am not so sure about the unsafe block! 现在它可以按要求工作,但是作为一个新手,我不太确定不安全的功能! I think I am effectively wrapping it with a safe interface, but I'm not sure. 我认为我实际上是用安全的界面将其包装起来的,但是我不确定。 So my question is whether that is OK or if there is a better, totally safe way to achieve that. 所以我的问题是这是否可以,或者是否有更好,完全安全的方法来实现这一目标。

Thanks for any help. 谢谢你的帮助。

EDIT Since you provided more info on your goal (lazy access to chunks of a huge file that lies on disk), I update my answer. 编辑由于您提供了有关您的目标的更多信息(对磁盘上巨大文件的大块文件的懒惰访问),我更新了我的答案。

You can use (as you tried) cells. 您可以使用(尝试时)单元格。 I quote the doc : 我引用文档

Since cell types enable mutation where it would otherwise be disallowed though, there are occasions when interior mutability might be appropriate, or even must be used, eg [...] Implementation details of logically-immutable methods. 由于单元格类型可以实现原本不允许的突变,因此有时可能适合内部变异,甚至必须使用内部变异,例如,逻辑上不变的方法的实现细节。 [...] [...]

Here's a piece of code that does the job (note that's very close to what you wrote): 这是完成任务的一段代码(请注意,这与您编写的内容非常接近):

use std::cell::RefCell;
use std::ops::Index;

// This is your file
const DATA: &str = "Rust. A language empowering everyone to build reliable and efficient software.";

#[derive(Debug)]
struct LazyVector<'a, 'b> {
    ref_v: RefCell<&'a mut Vec<&'b str>>
}

impl<'a, 'b> LazyVector<'a, 'b> {
    fn new(v: &'a mut Vec<&'b str>) -> LazyVector<'a, 'b> {
        LazyVector {
            ref_v: RefCell::new(v)
        }
    }

    /// get or load a chunk of two letters
    fn get_or_load(&self, i: usize) -> &'b str {
        let mut v = self.ref_v.borrow_mut();
        for k in v.len()..=i {
            v.push(&DATA[k * 2..k * 2 + 2]);
        }
        v[i]
    }
}

impl<'a, 'b> Index<usize> for LazyVector<'a, 'b> {
    type Output = str;
    fn index(&self, i: usize) -> &Self::Output {
        self.get_or_load(i)
    }
}

pub fn main() {
    let mut v = vec![];
    let lv = LazyVector::new(&mut v);
    println!("v[5]={}", &lv[5]); // v[5]=ng
    println!("{:?}", lv); // LazyVector { ref_v: RefCell { value: ["Ru", "st", ". ", "A ", "la", "ng"] } }
    println!("v[10]={}", &lv[10]); // v[10]=ow
    println!("{:?}", lv); // LazyVector { ref_v: RefCell { value: ["Ru", "st", ". ", "A ", "la", "ng", "ua", "ge", " e", "mp", "ow"] } }
}

The main difference with your try is that the underlying Vec is an external mutable vector, and that LazyVector gets only a (mutable) ref on this vector. 尝试的主要区别在于底层Vec是外部可变矢量,而LazyVector仅对此矢量获取(可变)引用。 A RwLock should be the way to handle concurrent access. RwLock应该是处理并发访问的方式。

However, I wouldn't recommend that solution: 但是,我不建议该解决方案:

First, your underlying Vec will rapidly grow and become as huge as the file on disk. 首先,您的基础Vec将迅速增长并变得与磁盘上的文件一样大。 Hence, you'll need a map instead of a vector and to keep the number of chunks in that map under a given boundary. 因此,您将需要一个映射而不是一个向量,并将该映射中的块数保持在给定的边界下。 If you ask for a chunk that is not in memory, you'll have to choose a chunk to remove. 如果您请求的内存块不在内存中,则必须选择一个要删除的块。 That's simply Paging and the OS is generally better at this game than you (see page replacement algorithm ). 这仅仅是分页,并且在这个游戏上,操作系统通常比您更好(请参阅页面替换算法 )。 As I wrote in a comment, memory mapped files (and maybe shared memory in case of "heavy" processes) would be more efficient: the OS handles the lazy loading of the file and the share of the read only data. 正如我在评论中所写, 内存映射文件 (在“繁重”进程的情况下可能共享内存 )将更加高效:操作系统处理文件的延迟加载和只读数据的共享。 R. Sedgewick remark in Algorithms in C , first edition, chapter 13, section "An Easier Way", explains why sorting a huge file (bigger than memory) may be easier than one thought: R. Sedgewick 在《 C语言中的算法》第一版第13章“更简单的方法”中的评论解释了为什么对一个大文件(大于内存)进行排序可能比一个想法容易:

In a good virtual-memory system, the programmer can address a very large amount of data, leaving to the system the responsibility of making sure that the adressed data is transferred from external to internal storage when needed. 在一个好的虚拟内存系统中,程序员可以处理大量数据,而系统有责任确保在需要时将所访问的数据从外部存储转移到内部存储。

Second, see my previous answer below. 其次,请参阅下面的我以前的答案。

PREVIOUS ANSWER 上一个答案

I coded this kind of vector once... in Java. 我曾经用Java编写过这种矢量的代码。 The use case was to represent a very sparse grid (many of the rows where only a few cells wide, but the grid was supposed to have a width of 1024). 该用例代表一个非常稀疏的网格(许多行中只有几个单元格宽,但是该网格的宽度应该为1024)。 To avoid to have to manually add cells when needed, I created a "list" that was doing roughly what you try to achieve (but there was only one default value). 为了避免在需要时手动添加单元格,我创建了一个“列表”,该列表大致上做了您尝试实现的目标(但只有一个默认值)。

At first, I made my list implement the List interface, but I quickly realized that I had to make a lot of useless (and slow) code not to break the Liskov substitution principle . 最初,我使我的列表实现了List接口,但是我很快意识到我必须制作许多无用(且缓慢)的代码,才能打破Liskov替换原则 Worse, the behavior of some methods was misleading regarding to the usual lists ( ArrayList , LinkedList , ...). 更糟糕的是,某些方法的行为对于常规列表( ArrayListLinkedList ,...)具有误导性。

It seems you are in the same situation: you would like your LazyVector to look like a usual Vec , and that's why you want to implement Index and maybe IndexMut traits. 看来您处在相同的情况:您希望LazyVector看起来像通常的Vec ,这就是为什么要实现Index以及IndexMut特性的原因。 But you are looking for workarounds to achieve this (eg unsafe code to match the traits methods signatures). 但是,您正在寻找解决方法来实现此目的(例如,与traits方法签名匹配的unsafe代码)。

My advice is: do not try to make LazyVector look like a usual vector, but make it clear that the LazyVector is not a usual vector . 我的建议是:不要试图使LazyVector看起来像通常的向量,但要明确指出LazyVector不是通常的向量 This is the Principle of least astonishment . 这是最小惊讶原则 Eg replace get (expected to only read the data by the user in good faith) by get_or_extend that makes clear that either you get something, either you create it. 例如,用get_or_extend替换get (预期仅由用户真诚地读取数据)由get_or_extend ,可以清楚地知道要么得到,要么创建。 If you add a get_or_extend_mut function, you have something that is not very attractive but efficient and predictable: 如果添加get_or_extend_mut函数,那么您将获得的吸引力不是很大,但是高效且可预测:

impl LazyVector {
    fn new() -> LazyVector { ... }

    fn get_or_extend(&mut self, i: usize) -> &usize { ... }

    fn get_or_extend_mut(&mut self, i: usize) -> &mut usize { ... }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM