简体   繁体   English

在 String 上使用 par_split,使用 rayon 处理并将结果收集到 Vector

[英]Use par_split on a String, process using rayon and collect result in a Vector

I am trying to read a file into a string messages defined on line #14.我正在尝试将文件读入第 14 行定义的字符串messages The file contains several blocks where each block starts with a number.该文件包含几个块,每个块都以一个数字开头。 After I read the file contents into the string messahes , each block is separated by newline and each line in a block is separated by __SEP__ .在我将文件内容读入字符串messahes ,每个块由换行符分隔,块中的每一行由__SEP__ I would like to use par_split() on the string messages , process each block using rayon and collect output from each block into a vector vec_final eg by calling collect() on line 54 or some similar mechanism to produce a vector that contains vec_local on line 53 produced by each block.我想在字符串messages上使用 par_split() ,使用 rayon 处理每个块并从每个块收集 output 到向量vec_final例如通过调用第 54 行上的 collect() 或一些类似的机制来生成包含vec_local的向量每个区块产生 53 个。 Any pointers on how I can achieve this are highly appreciated.任何关于我如何实现这一目标的指示都受到高度赞赏。

My code is as follows:我的代码如下:

fn starts_with_digit_or_at_sign(inp: &str) -> bool {
    let mut at_sign_found = false;
    if inp.len() > 0 {
        let ch = inp.chars().next().unwrap();
        if ch.is_numeric() || ch == '@' {
            return true;
        }
    }
    return false;
}
fn main() {
    let filepath = "inp.log";
    let data = std::fs::read_to_string(filepath).expect("file not found!");
    let mut messages: String = String::from("");
    let separator_char = '\n';
    let separator: String = String::from("__SEP__");
    let mut found_first_message = false;
    let mut start_of_new_msg = false;
    let mut line_num = 0;
    for line in data.lines() {
        line_num += 1;
        if line.len() > 0 {
            if starts_with_digit_or_at_sign(line) {
                start_of_new_msg = true;
                if !found_first_message {
                    found_first_message = true;
                } else {
                    messages.push(separator_char);
                }
            }
            if found_first_message {
                if !start_of_new_msg {
                    messages.push_str(&separator);
                }
                messages.push_str(line);
                if start_of_new_msg {
                    start_of_new_msg = false;
                    let mut tmp = String::from("Lnumber ");
                    tmp.push_str(&line_num.to_string());
                    messages.push_str(&separator);
                    messages.push_str(&tmp);
                }
            }
        }
    }
    messages.par_split(separator_char).for_each(|l| {
        println!(
            "line: '{}' len: {}, {}",
            l,
            l.len(),
            rayon::current_num_threads()
        );
        let vec_local: Vec<i32> = vec![l.len() as i32];
    }); // <-- line 54
}

Output produced by the cide is as follows: cide生产的Output如下:

line: '1__SEP__Lnumber 1__SEP__a__SEP__b__SEP__c' len: 41, 8
line: '3__SEP__Lnumber 9__SEP__g__SEP__h__SEP__i' len: 41, 8
line: '2__SEP__Lnumber 5__SEP__d__SEP__e__SEP__f' len: 41, 8
line: '4__SEP__Lnumber 13__SEP__j__SEP__k__SEP__l' len: 42, 8

File inp.log is as follows:文件 inp.log 如下:

1
a
b
c
2
d
e
f
3
g
h
i
4
j
k
l

I was able to resolve the issue by using par_lines() instead as follows:我能够通过使用 par_lines() 来解决问题,如下所示:

    let tmp: Vec<_> = messages.par_lines().map(|l| proc_len(l)).collect();
...
...
...

fn proc_len(inp: &str) -> Vec<usize> {
    let vec: Vec<usize> = vec![inp.len()];
    return vec;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM