简体   繁体   English

包含大型源文件的Rust crate初始化Trie结构无法编译(OOM)

[英]Rust crate including a large source file to initialize a Trie structure cannot compile (OOM)

I have a Trie structure implemented in Rust which I took from somebody else and seems to work just fine: 我有一个在Rust中实现的Trie结构,我从别人那里拿来,似乎工作得很好:

pub struct Trie<K, V>
where
    K: std::fmt::Debug + Eq + Hash + Clone,
    V: std::fmt::Debug + Clone,
{
    value: Option<V>,
    children: HashMap<K, Trie<K, V>>,
}

impl<K, V> Trie<K, V>
where
    K: std::fmt::Debug + Eq + Hash + Clone,
    V: std::fmt::Debug + Clone,
{
    pub fn new() -> Trie<K, V> {
        Trie {
            value: None,
            children: HashMap::new(),
        }
    }

    pub fn insert(&mut self, path: Vec<K>, v: V) {
        if path.is_empty() {
            match self.value {
                Some(_) => panic!("key exists"),
                None => {
                    self.value = Some(v);
                }
            }
            return;
        }

        self.children
            .entry(path[0].clone())
            .or_insert(Trie::new())
            .insert(path[1..].to_vec(), v)
    }

    pub fn fetch_or_default(&self, path: Vec<K>, default: V) -> V {
        let value = self.value.clone().unwrap_or(default);
        match path.len() {
            0 => value,
            _ => match self.children.get(&path[0]) {
                Some(child) => child.fetch_or_default(path[1..].to_vec(), value),
                None => value,
            },
        }
    }
}

I have a special file where I initialize an instance of a Trie based on a dictionary containing 150k words, to which I each assign a number between 0 and 150. 我有一个特殊的文件,我根据一个包含150k字的字典初始化一个Trie的实例,我给它分配一个0到150之间的数字。

The data was generated via JavaScript so it's not like it's impossibly huge, yet for some reason the Rust compile cannot handle it, and crashes out of memory despite my computer having 16Gb of memory + 2Gb swap (that it entirely uses before crashing). 这些数据是通过JavaScript生成的,所以它不太可能是巨大的,但由于某种原因,Rust编译无法处理它,并且尽管我的计算机具有16Gb的内存+ 2Gb交换(它在崩溃之前完全使用),但内存崩溃了。

use crate::data::*;

/*
a.reduce((str,ai,i) => { return str + ai.map((w,i) => `    //${w}\n    words_trie.insert(vec![' ',${w.split('').reverse().map(c => `'${c==`'`?`\\`:''}${c}'`).join(',')}], ${i});`).join('\n') + '\n' }, '');
*/

pub fn into(words_trie: &mut Trie<char, usize>) {
    // SPECIAL LINE, DO NO REPLACE BY ABOVE FORMULA
    words_trie.insert(vec![], 150);
    // ================================================
    //'n
    words_trie.insert(vec![' ', 'n', '\''], 0);
    ...
    //(150 000 times a comment followed by an insertion line)
    ...
}

The error message I get is the following one, but that's just the OOM killer at work: 我得到的错误信息如下,但这只是工作中的OOM杀手:

process didn't exit successfully: `rustc --edition=2018 --crate-name postag_nl_v1 src/main.rs --color always --crate-type bin --emit=dep-info,link -C debuginfo=2 -C metadata=f33cf4c96979227f -C extra-filename=-f33cf4c96979227f --out-dir /home/fremy/Documents/postag_nl_v1/target/debug/deps -C incremental=/home/fremy/Documents/postag_nl_v1/target/debug/incremental -L dependency=/home/fremy/Documents/postag_nl_v1/target/debug/deps` (signal: 9, SIGKILL: kill)

How can I get my Trie initialized based on the dictionary without making the compiler crash? 如何在不使编译器崩溃的情况下基于字典初始化我的Trie?

[EDIT] Here is a minimal repro, with dummy data: [编辑]这是一个最小的repro,具有虚拟数据:

https://gist.github.com/FremyCompany/1f6132441338f8b219d961c7254bd8ac#file-main-rs https://gist.github.com/FremyCompany/1f6132441338f8b219d961c7254bd8ac#file-main-rs

https://1drv.ms/u/s!AhVrPgyThAkTgQ8v2U99Ffzug7rJ [1Mb zip] https://1drv.ms/u/s!AhVrPgyThAkTgQ8v2U99Ffzug7rJ [1Mb zip]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM