保持字符串引用集合的最有效方法

Question

What is the most efficient way to keep a collection of references to strings in Rust?在 Rust 中保留对字符串的引用集合的最有效方法是什么？

Specifically, I have the following as the beginning of some code to parse command line arguments (option parsing to be added):具体来说，我有以下作为解析命令行 arguments 的一些代码的开头（要添加的选项解析）：

let args: Vec<String> = env::args().collect();
let mut files: Vec<&String> = Vec::new();
let mut i = 1;
while i < args.len() {
    let arg = &args[i];
    i += 1;
    if arg.as_bytes()[0] != b'-' {
        files.push(arg);
        continue;
    }
}

args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String> . args与https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致，声明为Vec<String> 。 As I understand it, that means new strings are constructed, which is mildly surprising;据我了解，这意味着构造了新的字符串，这有点令人惊讶； I would've expected that the command line arguments already exist in memory, and it would only be necessary to make a vector of references to the existing strings.我预计命令行 arguments 已经存在于 memory 中，并且只需要创建一个对现有字符串的引用向量。 But the compiler seems to concur that it needs to be Vec<String> .但是编译器似乎同意它需要是Vec<String> 。

It would seem inefficient to do the same for files ;对files做同样的事情似乎效率低下； there is surely no need for further copying.肯定没有必要进一步复制。 Instead, I have declared it as Vec<&String> , which as I understand it, means only creating a vector of references to the existing strings, which is optimal.相反，我已将其声明为Vec<&String> ，据我了解，这意味着仅创建对现有字符串的引用向量，这是最佳的。 (Not that it makes a measurable performance difference for command line arguments, but I want to figure this out now, so I can get it right later when dealing with much larger data.) （并不是说它对命令行 arguments 产生了可测量的性能差异，但我现在想弄清楚这一点，以便以后在处理更大的数据时可以得到它。）

Where I am slightly confused is that Rust seems to frequently recommend str over String , and indeed the compiler is happy to have files hold either str or &str .我有点困惑的是 Rust 似乎经常推荐str而不是String ，实际上编译器很高兴让files保存str或&str 。

My best guess right now is that str , being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String .我现在最好的猜测是str是一个 object ，它引用字符串的一部分，当你想保留对字符串的一部分的引用时最有效，但是当你知道你想要整个字符串时，它是最好跳过创建切片 object 的开销，只保留&String 。

Is the above correct, or am I missing something?以上是正确的，还是我错过了什么？

Answer 1

args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String> . args与https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致，声明为Vec<String> 。 As I understand it, that means new strings are constructed, which is mildly surprising;据我了解，这意味着构造了新的字符串，这有点令人惊讶； I would've expected that the command line arguments already exist in memory我本来希望命令行 arguments 已经存在于 memory

The command-line arguments do exist in memory but命令行 arguments 确实存在于 memory 但

they are not String , they are not even guaranteed to be UTF8它们不是String ，甚至不能保证它们是 UTF8
they are not in a Vec layout它们不在Vec布局中

Fundamentally there isn't even any prescription as to their storage, all you know is they're C strings (nul-terminated) and you get an array of pointers to those, whose last element is a null pointer.从根本上说，它们的存储甚至没有任何规定，你只知道它们是 C 字符串（以 nul 结尾），你会得到一个指向这些字符串的指针数组，其最后一个元素是 null 指针。

Which is why args is an iterator of String : it will lazily decode and validate each argument as you request it, in fact you can check its source code :这就是为什么args是String的迭代器：它会在您请求时延迟解码和验证每个参数，实际上您可以检查它的源代码：

pub fn args() -> Args {
    Args { inner: args_os() }
}
#[stable(feature = "env", since = "1.0.0")]
impl Iterator for Args {
    type Item = String;
    fn next(&mut self) -> Option<String> {
        self.inner.next().map(|s| s.into_string().unwrap())
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.inner.size_hint()
    }
}

Now I couldn't tell you why args_os yields OsString rather than OsStr , I would assume portability of some sort (eg some platforms might not guarantee the args data lives for the entirety of the program).现在我不能告诉你为什么args_os产生OsString而不是OsStr ，我会假设某种形式的可移植性（例如，某些平台可能不保证 args 数据在整个程序中都存在）。

My best guess right now is that str, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String.我现在最好的猜测是 str，作为一个 object 引用字符串的一部分，当你想保留对部分字符串的引用时最有效，但是当你知道你想要整个字符串时，它是最好跳过创建切片 object 的开销，只保留 &String。

Is the above correct, or am I missing something?以上是正确的，还是我错过了什么？

&String exists only for regularity (in the sense that it's a natural outgrowth of shared references and String existing concurrently), it's not actually useful: an &String only lets you access readonly / immutable methods of String , all of which are really provided by str aside from capacity() (which is rarely useful) and a handful of methods duplicated from str to String (I assume for efficiency) like len or is_empty . &String仅出于规律性而存在（从某种意义上说，它是共享引用和同时存在的String的自然产物），它实际上并没有用： &String仅允许您访问String的只读/不可变方法，所有这些方法实际上都是由str提供的从capacity() （很少有用）和一些从str复制到String的方法（我假设是为了提高效率），例如len或is_empty 。

&str is also generally more efficient than &String : while its size is 2 words (pointer, length) rather than one (pointer), it points directly to the relevant data rather than pointing to a pointer to the relevant data (and requiring a dereference to access the length property). &str通常也比&String更有效：虽然它的大小是 2 个字（指针，长度）而不是一个（指针），但它直接指向相关数据而不是指向相关数据的指针（并且需要取消引用访问长度属性）。 As such, &String is rarely considered useful and clippy will warn against it by default (also &Vec as &[] is usually better for the same reason).因此， &String很少被认为是有用的，并且默认情况下，clippy 会警告它（也是&Vec因为&[]通常出于同样的原因更好）。

保持字符串引用集合的最有效方法

问题描述

1 个解决方案

解决方案1
0 2021-12-20 10:35:34

保持字符串引用集合的最有效方法

问题描述

1 个解决方案

解决方案1 0 2021-12-20 10:35:34

解决方案1
0 2021-12-20 10:35:34