[英]Most efficient way to keep collection of string references
What is the most efficient way to keep a collection of references to strings in Rust?在 Rust 中保留对字符串的引用集合的最有效方法是什么?
Specifically, I have the following as the beginning of some code to parse command line arguments (option parsing to be added):具体来说,我有以下作为解析命令行 arguments 的一些代码的开头(要添加的选项解析):
let args: Vec<String> = env::args().collect();
let mut files: Vec<&String> = Vec::new();
let mut i = 1;
while i < args.len() {
let arg = &args[i];
i += 1;
if arg.as_bytes()[0] != b'-' {
files.push(arg);
continue;
}
}
args
is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String>
. args
与https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致,声明为Vec<String>
。 As I understand it, that means new strings are constructed, which is mildly surprising;据我了解,这意味着构造了新的字符串,这有点令人惊讶; I would've expected that the command line arguments already exist in memory, and it would only be necessary to make a vector of references to the existing strings.
我预计命令行 arguments 已经存在于 memory 中,并且只需要创建一个对现有字符串的引用向量。 But the compiler seems to concur that it needs to be
Vec<String>
.但是编译器似乎同意它需要是
Vec<String>
。
It would seem inefficient to do the same for files
;对
files
做同样的事情似乎效率低下; there is surely no need for further copying.肯定没有必要进一步复制。 Instead, I have declared it as
Vec<&String>
, which as I understand it, means only creating a vector of references to the existing strings, which is optimal.相反,我已将其声明为
Vec<&String>
,据我了解,这意味着仅创建对现有字符串的引用向量,这是最佳的。 (Not that it makes a measurable performance difference for command line arguments, but I want to figure this out now, so I can get it right later when dealing with much larger data.) (并不是说它对命令行 arguments 产生了可测量的性能差异,但我现在想弄清楚这一点,以便以后在处理更大的数据时可以得到它。)
Where I am slightly confused is that Rust seems to frequently recommend str
over String
, and indeed the compiler is happy to have files
hold either str
or &str
.我有点困惑的是 Rust 似乎经常推荐
str
而不是String
,实际上编译器很高兴让files
保存str
或&str
。
My best guess right now is that str
, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String
.我现在最好的猜测是
str
是一个 object ,它引用字符串的一部分,当你想保留对字符串的一部分的引用时最有效,但是当你知道你想要整个字符串时,它是最好跳过创建切片 object 的开销,只保留&String
。
Is the above correct, or am I missing something?以上是正确的,还是我错过了什么?
args
is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared asVec<String>
.args
与https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html中的建议一致,声明为Vec<String>
。 As I understand it, that means new strings are constructed, which is mildly surprising;据我了解,这意味着构造了新的字符串,这有点令人惊讶; I would've expected that the command line arguments already exist in memory
我本来希望命令行 arguments 已经存在于 memory
The command-line arguments do exist in memory but命令行 arguments 确实存在于 memory 但
String
, they are not even guaranteed to be UTF8String
,甚至不能保证它们是 UTF8Vec
layoutVec
布局中Fundamentally there isn't even any prescription as to their storage, all you know is they're C strings (nul-terminated) and you get an array of pointers to those, whose last element is a null pointer.从根本上说,它们的存储甚至没有任何规定,你只知道它们是 C 字符串(以 nul 结尾),你会得到一个指向这些字符串的指针数组,其最后一个元素是 null 指针。
Which is why args
is an iterator of String
: it will lazily decode and validate each argument as you request it, in fact you can check its source code :这就是为什么
args
是String
的迭代器:它会在您请求时延迟解码和验证每个参数,实际上您可以检查它的源代码:
pub fn args() -> Args {
Args { inner: args_os() }
}
#[stable(feature = "env", since = "1.0.0")]
impl Iterator for Args {
type Item = String;
fn next(&mut self) -> Option<String> {
self.inner.next().map(|s| s.into_string().unwrap())
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.inner.size_hint()
}
}
Now I couldn't tell you why args_os
yields OsString
rather than OsStr
, I would assume portability of some sort (eg some platforms might not guarantee the args data lives for the entirety of the program).现在我不能告诉你为什么
args_os
产生OsString
而不是OsStr
,我会假设某种形式的可移植性(例如,某些平台可能不保证 args 数据在整个程序中都存在)。
My best guess right now is that str, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String.
我现在最好的猜测是 str,作为一个 object 引用字符串的一部分,当你想保留对部分字符串的引用时最有效,但是当你知道你想要整个字符串时,它是最好跳过创建切片 object 的开销,只保留 &String。
Is the above correct, or am I missing something?
以上是正确的,还是我错过了什么?
&String
exists only for regularity (in the sense that it's a natural outgrowth of shared references and String
existing concurrently), it's not actually useful: an &String
only lets you access readonly / immutable methods of String
, all of which are really provided by str
aside from capacity()
(which is rarely useful) and a handful of methods duplicated from str
to String
(I assume for efficiency) like len
or is_empty
. &String
仅出于规律性而存在(从某种意义上说,它是共享引用和同时存在的String
的自然产物),它实际上并没有用: &String
仅允许您访问String
的只读/不可变方法,所有这些方法实际上都是由str
提供的从capacity()
(很少有用)和一些从str
复制到String
的方法(我假设是为了提高效率),例如len
或is_empty
。
&str
is also generally more efficient than &String
: while its size is 2 words (pointer, length) rather than one (pointer), it points directly to the relevant data rather than pointing to a pointer to the relevant data (and requiring a dereference to access the length property). &str
通常也比&String
更有效:虽然它的大小是 2 个字(指针,长度)而不是一个(指针),但它直接指向相关数据而不是指向相关数据的指针(并且需要取消引用访问长度属性)。 As such, &String
is rarely considered useful and clippy will warn against it by default (also &Vec
as &[]
is usually better for the same reason).因此,
&String
很少被认为是有用的,并且默认情况下,clippy 会警告它(也是&Vec
因为&[]
通常出于同样的原因更好)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.