[英]Rust Rayon Threads create Garbage
I got a larger program that I can summarize like:我有一个更大的程序,我可以总结如下:
SequentialPart
ThreadPoolParallelized
SequentialPart
ParallelPartInQuestion
SequentialPart
This code gets called in sequence many times.此代码按顺序多次调用。
I'm using Rayon threads to parallelize the second part such like:我正在使用人造丝线程来并行化第二部分,例如:
final_results = (0..num_txns).into_par_iter()
.filter_map(|idx| {
if !matches!(ret, None) {
return None;
}
match last_input_output.take_output(idx) {
ExecutionStatus::Success(t) => Some(t),
ExecutionStatus::SkipRest(t) => {
Some(t)
}
ExecutionStatus::Abort(err) => {
None
}
}
}).collect();
I've also done this already using parallel chunks我也已经使用并行块完成了这项工作
let interm_result: Vec<ExtrResult<E>> = (0..num_txns)
.collect::<Vec<TxnIndex>>()
.par_chunks(chunk_size)
.map(|chunk| {
Either way, I noticed that the first time this code runs, everything works as expected and I get a decent performance boost out of it.无论哪种方式,我注意到这段代码第一次运行时,一切都按预期工作,并且我从中获得了不错的性能提升。
However, on the second iteration the first parallel piece of code (ThreadPoolParallelized) runs around 20% slower every time.然而,在第二次迭代中,第一个并行代码(ThreadPoolParallelized)每次运行的速度都慢了大约 20%。
So I concluded that somehow Rayon must leave something behind which has to be cleaned up afterwards resulting in this performance drop.所以我得出的结论是,人造丝必须以某种方式留下一些必须在之后清理的东西,从而导致性能下降。
is there something I can do about this?我能做些什么吗?
Edit: What the take_output does is:编辑: take_output 的作用是:
outputs: Vec<CachePadded<ArcSwapOption<TxnOutput<T, E>>>>, // txn_idx -> output.
pub fn take_output(&self, txn_idx: TxnIndex) -> ExecutionStatus<T, Error<E>> {
let owning_ptr = self.outputs[txn_idx]
.swap(None)
.expect("Output must be recorded after execution");
if let Ok(output) = Arc::try_unwrap(owning_ptr) {
output
} else {
unreachable!("Output should be uniquely owned after execution");
}
}
I figured out what was causing the Problem.我弄清楚是什么导致了问题。 The first parallel part in this execution used a manually created threadpool.
此执行中的第一个并行部分使用手动创建的线程池。 However,
into_par_iter
uses the global threadpool if not otherwise specified and keeps it alive for some time.但是,如果没有另外指定,
into_par_iter
将使用全局线程池并使其保持活动状态一段时间。 This interferes with the manually created threadpool这会干扰手动创建的线程池
let interm_result: Vec<ExtrResult<E>> = RAYON_EXEC_POOL.install(|| {
(0..num_txns)
By specifically wrapping the code that is supposed to be executed in parallel in the pool.install
call it re-uses the same threadpool, doesn't create an additional one that has to be destroyed with an overhead later and preserves performance.通过专门包装应该在
pool.install
调用中并行执行的代码,它可以重新使用相同的线程池,不会创建一个额外的线程池,以后必须通过开销销毁并保持性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.