[英]Store a reference to the underlying buffer from a struct that doesn't live long enough
I'm attempting to write an incremental XML parser in Rust using quick_xml .我正在尝试使用quick_xml在 Rust 中编写增量 XML 解析器。
Some of the XML files will not fit in memory (on my laptop) so I'm trying to only store relevant chunks of each file in a buffer of Vec<u8>
.一些 XML 文件不适合 memory (在我的笔记本电脑上),所以我试图只将每个文件的相关块存储在Vec<u8>
的缓冲区中。
Within each file chunk of Vec<u8>
I want to store borrows to slices in some struct Data
在Vec<u8>
的每个文件块中,我想将借用存储到某些结构Data
中的切片
quick_xml provides a read_event
method which appends to the buffer and returns a quick_xml::events::Event
(an enum containing a struct with a buf: Cow<'a, [u8]>
field which borrows from the buffer) quick_xml 提供了一个read_event
方法,该方法附加到缓冲区并返回一个quick_xml::events::Event
(一个包含一个带有buf: Cow<'a, [u8]>
字段,它从缓冲区借用)
Essentially I want to take the data referenced by the Event
and store it in my Data
struct.本质上,我想获取Event
引用的数据并将其存储在我的Data
结构中。
However the borrow checker has a heart attack because the Event
only lives for the call to read_event
and I'm trying to keep a reference to it that lives as long as the data in the buffer.然而,借用检查器心脏病发作,因为Event
只存在于对read_event
的调用,而我试图保持对它的引用,该引用与缓冲区中的数据一样长。
The code below is the implementation of what I have tried to describe above.下面的代码是我上面试图描述的实现。 Could I get some help in storing a borrow to the underlying buf
from an Event
?我可以从Event
中获得一些帮助来存储对底层buf
的借用吗?
use quick_xml::events::Event;
use quick_xml::Reader;
const XML: &str = r#"<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RUN xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" alias="HAP1 gene trap unselected control dataset" accession="SRR2034585" center_name="Stanford University">
<IDENTIFIERS>
<PRIMARY_ID>SRR2034585</PRIMARY_ID>
<SUBMITTER_ID namespace="Stanford University">HAP1 gene trap unselected control dataset</SUBMITTER_ID>
</IDENTIFIERS>
<EXPERIMENT_REF accession="SRX1034759"/>
</RUN>
</RUN_SET>
"#;
#[derive(Debug)]
struct Data<'a> {
primary_id: Option<&'a [u8]>,
experiment_ref: Option<&'a [u8]>,
}
fn main() {
let mut buf: Vec<u8> = vec![];
let mut reader = Reader::from_str(XML);
let mut depth = 0;
let mut path: Vec<u8> = vec![];
reader.expand_empty_elements(true);
let mut data = Data { primary_id: None, experiment_ref: None };
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) => {
depth += 1;
path.push(b"/"[0]);
path.append(&mut e.name().to_vec());
if path == "/RUN_SET/RUN/EXPERIMENT_REF".as_bytes() {
let experiment_ref = // What to put here?
data = Data { experiment_ref, ..data };
}
}
Ok(Event::End(ref e)) => {
depth -= 1;
path.truncate(path.len() - e.name().len() - 1);
}
Ok(Event::Eof) => { break; }
_ => {}
}
if depth == 1 {
println!("{:?}", data);
buf.clear();
path.clear();
}
}
}
Calling read_event
will cause the buffer to expand if necessary, which can change its address, so any references become invalid.如有必要,调用read_event
将导致缓冲区扩展,这可能会更改其地址,因此任何引用都将变为无效。 Specifically, you are trying to call read_event
, store a reference ( data
) pointing into the buffer, then call read_event
again which can move the buffer.具体来说,您正在尝试调用read_event
,存储指向缓冲区的引用( data
),然后再次调用read_event
可以移动缓冲区。
It seems the best way to solve this is to move/clone the event name so that its lifetime is not bound to the buffer.似乎解决此问题的最佳方法是移动/克隆事件名称,使其生命周期不绑定到缓冲区。 Frustratingly, it seems that quick_xml::events::BytesStart<'a>
exposes no way to directly move the underlying Cow<'a, [u8]>
so we have to store the BytesStart
object itself in order to avoid a potentially unnecessary clone.令人沮丧的是, quick_xml::events::BytesStart<'a>
似乎无法直接移动底层Cow<'a, [u8]>
所以我们必须存储BytesStart
object 本身以避免可能不必要的克隆.
Here is one way to do this.这是执行此操作的一种方法。 I made significant changes to the code in order to more accurately/efficiently do what I think you intended:我对代码进行了重大更改,以便更准确/有效地执行我认为您想要的操作:
use quick_xml::events::Event;
use quick_xml::Reader;
const XML: &str = r#"<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RUN xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" alias="HAP1 gene trap unselected control dataset" accession="SRR2034585" center_name="Stanford University">
<IDENTIFIERS>
<PRIMARY_ID>SRR2034585</PRIMARY_ID>
<SUBMITTER_ID namespace="Stanford University">HAP1 gene trap unselected control dataset</SUBMITTER_ID>
</IDENTIFIERS>
<EXPERIMENT_REF accession="SRX1034759"/>
</RUN>
</RUN_SET>
"#;
#[derive(Debug)]
struct Data<'a> {
primary_id: Option<&'a [u8]>,
experiment_ref: Option<quick_xml::events::BytesStart<'static>>,
}
fn main() {
let target: &[&[u8]] = &[b"RUN_SET", b"RUN", b"EXPERIMENT_REF"];
let mut buf: Vec<u8> = vec![];
let mut reader = Reader::from_str(XML);
let mut depth = 0;
let mut good = 0;
reader.expand_empty_elements(true);
let mut data = Data {
primary_id: None,
experiment_ref: None,
};
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(e)) => {
if depth == good && target.get(depth) == Some(&e.name()) {
good += 1;
if good == target.len() {
data = Data {
experiment_ref: Some(e.into_owned()),
..data
};
}
}
depth += 1;
}
Ok(Event::End(_)) => {
depth -= 1;
good = good.min(depth);
}
Ok(Event::Eof) => {
buf.clear();
break;
}
_ => {}
}
buf.clear();
}
println!("{:?}", data);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.