[英]Use par_split on a String, process using rayon and collect result in a Vector
I am trying to read a file into a string messages
defined on line #14.我正在尝试将文件读入第 14 行定义的字符串
messages
。 The file contains several blocks where each block starts with a number.该文件包含几个块,每个块都以一个数字开头。 After I read the file contents into the string
messahes
, each block is separated by newline and each line in a block is separated by __SEP__
.在我将文件内容读入字符串
messahes
,每个块由换行符分隔,块中的每一行由__SEP__
。 I would like to use par_split() on the string messages
, process each block using rayon and collect output from each block into a vector vec_final
eg by calling collect() on line 54 or some similar mechanism to produce a vector that contains vec_local
on line 53 produced by each block.我想在字符串
messages
上使用 par_split() ,使用 rayon 处理每个块并从每个块收集 output 到向量vec_final
例如通过调用第 54 行上的 collect() 或一些类似的机制来生成包含vec_local
的向量每个区块产生 53 个。 Any pointers on how I can achieve this are highly appreciated.任何关于我如何实现这一目标的指示都受到高度赞赏。
My code is as follows:我的代码如下:
fn starts_with_digit_or_at_sign(inp: &str) -> bool {
let mut at_sign_found = false;
if inp.len() > 0 {
let ch = inp.chars().next().unwrap();
if ch.is_numeric() || ch == '@' {
return true;
}
}
return false;
}
fn main() {
let filepath = "inp.log";
let data = std::fs::read_to_string(filepath).expect("file not found!");
let mut messages: String = String::from("");
let separator_char = '\n';
let separator: String = String::from("__SEP__");
let mut found_first_message = false;
let mut start_of_new_msg = false;
let mut line_num = 0;
for line in data.lines() {
line_num += 1;
if line.len() > 0 {
if starts_with_digit_or_at_sign(line) {
start_of_new_msg = true;
if !found_first_message {
found_first_message = true;
} else {
messages.push(separator_char);
}
}
if found_first_message {
if !start_of_new_msg {
messages.push_str(&separator);
}
messages.push_str(line);
if start_of_new_msg {
start_of_new_msg = false;
let mut tmp = String::from("Lnumber ");
tmp.push_str(&line_num.to_string());
messages.push_str(&separator);
messages.push_str(&tmp);
}
}
}
}
messages.par_split(separator_char).for_each(|l| {
println!(
"line: '{}' len: {}, {}",
l,
l.len(),
rayon::current_num_threads()
);
let vec_local: Vec<i32> = vec![l.len() as i32];
}); // <-- line 54
}
Output produced by the cide is as follows: cide生产的Output如下:
line: '1__SEP__Lnumber 1__SEP__a__SEP__b__SEP__c' len: 41, 8
line: '3__SEP__Lnumber 9__SEP__g__SEP__h__SEP__i' len: 41, 8
line: '2__SEP__Lnumber 5__SEP__d__SEP__e__SEP__f' len: 41, 8
line: '4__SEP__Lnumber 13__SEP__j__SEP__k__SEP__l' len: 42, 8
File inp.log is as follows:文件 inp.log 如下:
1
a
b
c
2
d
e
f
3
g
h
i
4
j
k
l
I was able to resolve the issue by using par_lines() instead as follows:我能够通过使用 par_lines() 来解决问题,如下所示:
let tmp: Vec<_> = messages.par_lines().map(|l| proc_len(l)).collect();
...
...
...
fn proc_len(inp: &str) -> Vec<usize> {
let vec: Vec<usize> = vec![inp.len()];
return vec;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.