简体   繁体   English

存储对来自寿命不够长的结构的底层缓冲区的引用

[英]Store a reference to the underlying buffer from a struct that doesn't live long enough

I'm attempting to write an incremental XML parser in Rust using quick_xml .我正在尝试使用quick_xml在 Rust 中编写增量 XML 解析器。

Some of the XML files will not fit in memory (on my laptop) so I'm trying to only store relevant chunks of each file in a buffer of Vec<u8> .一些 XML 文件不适合 memory (在我的笔记本电脑上),所以我试图只将每个文件的相关块存储在Vec<u8>的缓冲区中。

Within each file chunk of Vec<u8> I want to store borrows to slices in some struct DataVec<u8>的每个文件块中,我想将借用存储到某些结构Data中的切片

quick_xml provides a read_event method which appends to the buffer and returns a quick_xml::events::Event (an enum containing a struct with a buf: Cow<'a, [u8]> field which borrows from the buffer) quick_xml 提供了一个read_event方法,该方法附加到缓冲区并返回一个quick_xml::events::Event (一个包含一个带有buf: Cow<'a, [u8]>字段,它从缓冲区借用)

Essentially I want to take the data referenced by the Event and store it in my Data struct.本质上,我想获取Event引用的数据并将其存储在我的Data结构中。

However the borrow checker has a heart attack because the Event only lives for the call to read_event and I'm trying to keep a reference to it that lives as long as the data in the buffer.然而,借用检查器心脏病发作,因为Event只存在于对read_event的调用,而我试图保持对它的引用,该引用与缓冲区中的数据一样长。

The code below is the implementation of what I have tried to describe above.下面的代码是我上面试图描述的实现。 Could I get some help in storing a borrow to the underlying buf from an Event ?我可以从Event中获得一些帮助来存储对底层buf的借用吗?

use quick_xml::events::Event;
use quick_xml::Reader;

const XML: &str = r#"<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RUN xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" alias="HAP1 gene trap unselected control dataset" accession="SRR2034585" center_name="Stanford University">
    <IDENTIFIERS>
      <PRIMARY_ID>SRR2034585</PRIMARY_ID>
      <SUBMITTER_ID namespace="Stanford University">HAP1 gene trap unselected control dataset</SUBMITTER_ID>
    </IDENTIFIERS>
    <EXPERIMENT_REF accession="SRX1034759"/>
  </RUN>
</RUN_SET>
"#;

#[derive(Debug)]
struct Data<'a> {
    primary_id: Option<&'a [u8]>,
    experiment_ref: Option<&'a [u8]>,
}


fn main() {
    let mut buf: Vec<u8> = vec![];
    let mut reader = Reader::from_str(XML);
    let mut depth = 0;
    let mut path: Vec<u8> = vec![];
    reader.expand_empty_elements(true);
    let mut data = Data { primary_id: None, experiment_ref: None };
    loop {
        match reader.read_event(&mut buf) {
            Ok(Event::Start(ref e)) => {
                depth += 1;
                path.push(b"/"[0]);
                path.append(&mut e.name().to_vec());

                if path == "/RUN_SET/RUN/EXPERIMENT_REF".as_bytes() {
                    let experiment_ref = // What to put here?
                    data = Data { experiment_ref, ..data };
                }
            }
            Ok(Event::End(ref e)) => {
                depth -= 1;
                path.truncate(path.len() - e.name().len() - 1);
            }
            Ok(Event::Eof) => { break; }
            _ => {}
        }
        if depth == 1 {
            println!("{:?}", data);
            buf.clear();
            path.clear();
        }
    }
    
}

Calling read_event will cause the buffer to expand if necessary, which can change its address, so any references become invalid.如有必要,调用read_event将导致缓冲区扩展,这可能会更改其地址,因此任何引用都将变为无效。 Specifically, you are trying to call read_event , store a reference ( data ) pointing into the buffer, then call read_event again which can move the buffer.具体来说,您正在尝试调用read_event ,存储指向缓冲区的引用( data ),然后再次调用read_event可以移动缓冲区。

It seems the best way to solve this is to move/clone the event name so that its lifetime is not bound to the buffer.似乎解决此问题的最佳方法是移动/克隆事件名称,使其生命周期不绑定到缓冲区。 Frustratingly, it seems that quick_xml::events::BytesStart<'a> exposes no way to directly move the underlying Cow<'a, [u8]> so we have to store the BytesStart object itself in order to avoid a potentially unnecessary clone.令人沮丧的是, quick_xml::events::BytesStart<'a>似乎无法直接移动底层Cow<'a, [u8]>所以我们必须存储BytesStart object 本身以避免可能不必要的克隆.

Here is one way to do this.这是执行此操作的一种方法。 I made significant changes to the code in order to more accurately/efficiently do what I think you intended:我对代码进行了重大更改,以便更准确/有效地执行我认为您想要的操作:

use quick_xml::events::Event;
use quick_xml::Reader;

const XML: &str = r#"<?xml version="1.0" encoding="UTF-8"?>
<RUN_SET xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RUN xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" alias="HAP1 gene trap unselected control dataset" accession="SRR2034585" center_name="Stanford University">
    <IDENTIFIERS>
      <PRIMARY_ID>SRR2034585</PRIMARY_ID>
      <SUBMITTER_ID namespace="Stanford University">HAP1 gene trap unselected control dataset</SUBMITTER_ID>
    </IDENTIFIERS>
    <EXPERIMENT_REF accession="SRX1034759"/>
  </RUN>
</RUN_SET>
"#;

#[derive(Debug)]
struct Data<'a> {
    primary_id: Option<&'a [u8]>,
    experiment_ref: Option<quick_xml::events::BytesStart<'static>>,
}

fn main() {
    let target: &[&[u8]] = &[b"RUN_SET", b"RUN", b"EXPERIMENT_REF"];
    let mut buf: Vec<u8> = vec![];
    let mut reader = Reader::from_str(XML);
    let mut depth = 0;
    let mut good = 0;
    reader.expand_empty_elements(true);
    let mut data = Data {
        primary_id: None,
        experiment_ref: None,
    };
    loop {
        match reader.read_event(&mut buf) {
            Ok(Event::Start(e)) => {
                if depth == good && target.get(depth) == Some(&e.name()) {
                    good += 1;
                    if good == target.len() {
                        data = Data {
                            experiment_ref: Some(e.into_owned()),
                            ..data
                        };
                    }
                }
                depth += 1;
            }
            Ok(Event::End(_)) => {
                depth -= 1;
                good = good.min(depth);
            }
            Ok(Event::Eof) => {
                buf.clear();
                break;
            }
            _ => {}
        }
        buf.clear();
    }
    println!("{:?}", data);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM