[英]In multi-core machine, Linux OS, when process scheduler will migrate one process to another cpu
In my program, whose rss is 65G, when call fork
, sys_clone->dup_mm->copy_page_range
will consume more than 2 seconds. 在我的rss为65G的程序中,调用
fork
, sys_clone->dup_mm->copy_page_range
将消耗2秒钟以上的时间。 In this case, one cpu will 100% sys when execute fork, at the same time, one thread cannot get cpu time until fork finish. 在这种情况下,执行fork时,一个cpu将100%sys,同时,直到fork完成,一个线程才能获得cpu时间。 The machine has 16 CPUs, the other CPUs is idle.
机器有16个CPU,其他CPU处于空闲状态。
So my question is one cpu was busy on fork, why the scheduler don't migrate the process waiting on this cpu to other idle cpu? 所以我的问题是一个cpu忙于派生,为什么调度程序不将等待此cpu的进程迁移到其他空闲cpu? In general, when and how the scheduler migrate process between cpus?
通常,调度程序何时以及如何在cpus之间迁移进程?
I search this site, and the existing threads cannot answer my question. 我搜索此站点,并且现有线程无法回答我的问题。
rss is 65G, when call fork, sys_clone->dup_mm->copy_page_range will consume more than 2 seconds
rss为65G,当调用fork时,sys_clone-> dup_mm-> copy_page_range将消耗2秒钟以上
While doing fork
(or clone
) the vmas of existing process should be copied into vmas of new process. 在进行
fork
(或clone
)操作时,应将现有进程的vmas复制到新进程的vmas中。 dup_mm
function (kernel/fork.c) creates new mm
and do actual copy. dup_mm
函数(kernel / fork.c)创建新的mm
并进行实际复制。 There are no direct calls to copy_page_range
, but I think, static function dup_mmap
may be inlined into dup_mm
and it has calls to copy_page_range
. 没有直接调用
copy_page_range
,但是我认为, 静态函数dup_mmap
可以内联到dup_mm
并且它具有对copy_page_range
调用。
In the dup_mmap
there are several locks locked, both in new mm
and old oldmm
: 在
dup_mmap
中,新的mm
和旧的oldmm
都锁定了几个锁:
356 down_write(&oldmm->mmap_sem);
After taking the mmap_sem
reader/writer semaphore, there is a loop over all mmaps to copy their metainformation: 取得
mmap_sem
读取器/写入器信号量后,所有mmap上都有一个循环来复制其元信息:
381 for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next)
Only after the loop (it is long in your case), mmap_sem
is unlocked: 仅在循环(您的情况很长)之后,
mmap_sem
解锁:
465 out:
468 up_write(&oldmm->mmap_sem);
While the rwlock mmap_sep
is down by writer, no any other reader or writer can do anything with mmaps in oldmm
. 虽然rwlock
mmap_sep
被writer降低了,但是其他任何阅读器或作家都无法对oldmm中的oldmm
做任何事情。
one thread cannot get cpu time until fork finish So my question is one cpu was busy on fork, why the scheduler don't migrate the process waiting on this cpu to other idle cpu?
一个线程在派生完成之前无法获得cpu时间所以我的问题是一个cpu忙于派生,为什么调度程序不将等待此cpu的进程迁移到其他空闲cpu?
Are you sure, that other thread is ready to run and not wanting to do anything with mmaps, like: 您确定其他线程已准备好运行并且不想对mmap做任何事情,例如:
brk
), brk
), Actually, the wait-cpu thread is my IO thread, which send/receive package from client, in my observation, the package always exist, but the IO thread cannot receive it.
实际上,wait-cpu线程是我的IO线程,它从客户端发送/接收程序包,据我观察,该程序包始终存在,但IO线程无法接收它。
You should check stack of your wait-cpu thread (there is even SysRq for this), and kind of I/O. 您应该检查您的wait-cpu线程的堆栈(甚至还有SysRq),以及I / O的种类。
mmap
ing of file is the variant of I/O which will be blocked on mmap_sem
by fork. mmap
荷兰国际集团文件的是I的变体/ O,这将被阻塞上mmap_sem
通过叉。
Also you can check the "last used CPU" of the wait-cpu thread, eg in the top
monitoring utility, by enabling the thread view ( H
key) and adding "Last used CPU" column to output ( fj
in older; f
scroll to P
, enter in newer). 你也可以检查等待CPU线程,例如“最近使用的CPU”,在
top
监测工具,通过启用线程视图( H
键),并加入“上次使用的CPU”列输出( fj
中老年; f
滚动到P
,输入较新的字符)。 I think it is possible that your wait-cpu thread already was on the other CPU, just not allowed (not ready) to run. 我认为您的wait-cpu线程可能已经在另一个CPU上,只是不允许(未准备好)运行。
If you are using fork only to make exec
, it can be useful to: 如果仅使用fork来使
exec
,则对以下情况可能有用:
vfork
+ exec
(or just to posix_spawn
). vfork
+ exec
(或者只是切换到posix_spawn
)。 vfork
will suspend your process (but may not suspend your other threads, it is dangerous ) until new process will do exec
or exit
, but execing may be faster than waiting for 65 GB of mmaps to be copied. vfork
将挂起您的进程(但可能不会挂起其他线程,这很危险 ),直到新进程执行exec
或exit
为止,但是执行速度可能比等待复制65 GB的mmap更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.