简体   繁体   English

保持字符串引用集合的最有效方法

[英]Most efficient way to keep collection of string references

What is the most efficient way to keep a collection of references to strings in Rust?在 Rust 中保留对字符串的引用集合的最有效方法是什么?

Specifically, I have the following as the beginning of some code to parse command line arguments (option parsing to be added):具体来说,我有以下作为解析命令行 arguments 的一些代码的开头(要添加的选项解析):

let args: Vec<String> = env::args().collect();
let mut files: Vec<&String> = Vec::new();
let mut i = 1;
while i < args.len() {
    let arg = &args[i];
    i += 1;
    if arg.as_bytes()[0] != b'-' {
        files.push(arg);
        continue;
    }
}

args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String> . argshttps://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致,声明为Vec<String> As I understand it, that means new strings are constructed, which is mildly surprising;据我了解,这意味着构造了新的字符串,这有点令人惊讶; I would've expected that the command line arguments already exist in memory, and it would only be necessary to make a vector of references to the existing strings.我预计命令行 arguments 已经存在于 memory 中,并且只需要创建一个对现有字符串的引用向量。 But the compiler seems to concur that it needs to be Vec<String> .但是编译器似乎同意它需要是Vec<String>

It would seem inefficient to do the same for files ;files做同样的事情似乎效率低下; there is surely no need for further copying.肯定没有必要进一步复制。 Instead, I have declared it as Vec<&String> , which as I understand it, means only creating a vector of references to the existing strings, which is optimal.相反,我已将其声明为Vec<&String> ,据我了解,这意味着仅创建对现有字符串的引用向量,这是最佳的。 (Not that it makes a measurable performance difference for command line arguments, but I want to figure this out now, so I can get it right later when dealing with much larger data.) (并不是说它对命令行 arguments 产生了可测量的性能差异,但我现在想弄清楚这一点,以便以后在处理更大的数据时可以得到它。)

Where I am slightly confused is that Rust seems to frequently recommend str over String , and indeed the compiler is happy to have files hold either str or &str .我有点困惑的是 Rust 似乎经常推荐str而不是String ,实际上编译器很高兴让files保存str&str

My best guess right now is that str , being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String .我现在最好的猜测是str是一个 object ,它引用字符串的一部分,当你想保留对字符串的一部分的引用时最有效,但是当你知道你想要整个字符串时,它是最好跳过创建切片 object 的开销,只保留&String

Is the above correct, or am I missing something?以上是正确的,还是我错过了什么?

args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String> . argshttps://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致,声明为Vec<String> As I understand it, that means new strings are constructed, which is mildly surprising;据我了解,这意味着构造了新的字符串,这有点令人惊讶; I would've expected that the command line arguments already exist in memory我本来希望命令行 arguments 已经存在于 memory

The command-line arguments do exist in memory but命令行 arguments 确实存在于 memory 但

  • they are not String , they are not even guaranteed to be UTF8它们不是String ,甚至不能保证它们是 UTF8
  • they are not in a Vec layout它们不在Vec布局中

Fundamentally there isn't even any prescription as to their storage, all you know is they're C strings (nul-terminated) and you get an array of pointers to those, whose last element is a null pointer.从根本上说,它们的存储甚至没有任何规定,你只知道它们是 C 字符串(以 nul 结尾),你会得到一个指向这些字符串的指针数组,其最后一个元素是 null 指针。

Which is why args is an iterator of String : it will lazily decode and validate each argument as you request it, in fact you can check its source code :这就是为什么argsString迭代器:它会在您请求时延迟解码和验证每个参数,实际上您可以检查它的源代码

pub fn args() -> Args {
    Args { inner: args_os() }
}
#[stable(feature = "env", since = "1.0.0")]
impl Iterator for Args {
    type Item = String;
    fn next(&mut self) -> Option<String> {
        self.inner.next().map(|s| s.into_string().unwrap())
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.inner.size_hint()
    }
}

Now I couldn't tell you why args_os yields OsString rather than OsStr , I would assume portability of some sort (eg some platforms might not guarantee the args data lives for the entirety of the program).现在我不能告诉你为什么args_os产生OsString而不是OsStr ,我会假设某种形式的可移植性(例如,某些平台可能不保证 args 数据在整个程序中都存在)。

My best guess right now is that str, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String.我现在最好的猜测是 str,作为一个 object 引用字符串的一部分,当你想保留对部分字符串的引用时最有效,但是当你知道你想要整个字符串时,它是最好跳过创建切片 object 的开销,只保留 &String。

Is the above correct, or am I missing something?以上是正确的,还是我错过了什么?

&String exists only for regularity (in the sense that it's a natural outgrowth of shared references and String existing concurrently), it's not actually useful: an &String only lets you access readonly / immutable methods of String , all of which are really provided by str aside from capacity() (which is rarely useful) and a handful of methods duplicated from str to String (I assume for efficiency) like len or is_empty . &String仅出于规律性而存在(从某种意义上说,它是共享引用和同时存在的String的自然产物),它实际上并没有用: &String仅允许您访问String的只读/不可变方法,所有这些方法实际上都是由str提供的从capacity() (很少有用)和一些从str复制到String的方法(我假设是为了提高效率),例如lenis_empty

&str is also generally more efficient than &String : while its size is 2 words (pointer, length) rather than one (pointer), it points directly to the relevant data rather than pointing to a pointer to the relevant data (and requiring a dereference to access the length property). &str通常也比&String更有效:虽然它的大小是 2 个字(指针,长度)而不是一个(指针),但它直接指向相关数据而不是指向相关数据的指针(并且需要取消引用访问长度属性)。 As such, &String is rarely considered useful and clippy will warn against it by default (also &Vec as &[] is usually better for the same reason).因此, &String很少被认为是有用的,并且默认情况下,clippy 会警告它(也是&Vec因为&[]通常出于同样的原因更好)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM