linux kernel 源中的上下文切换最终发生在哪里？

Question

In linux, process scheduling occurs after all interrupts (timer interrupt, and other interrupts) or when a process relinquishes CPU(by calling explicit schedule() function).在 linux 中，进程调度发生在所有中断（定时器中断和其他中断）之后或当进程放弃 CPU 时（通过调用显式 schedule() 函数）。 Today I was trying to see where context switching occurs in linux source (kernel version 2.6.23)今天我想看看linux源（内核版本2.6.23）中上下文切换发生在哪里
(I think I checked this several years ago but I'm not sure now..I was looking at sparc arch then.) （我想我几年前检查过这个，但我现在不确定……当时我在看 sparc 拱门。）
I looked it up from the main_timer_handler(in arch/x86_64/kernel/time.c), but couldn't find it.我从 main_timer_handler（在 arch/x86_64/kernel/time.c 中）查找它，但找不到它。

Finally I found it in./arch/x86_64/kernel/entry.S.最后我在./arch/x86_64/kernel/entry.S中找到了。

    ENTRY(common_interrupt)
        XCPT_FRAME
        interrupt do_IRQ
        /* 0(%rsp): oldrsp-ARGOFFSET */
    ret_from_intr:
        cli
        TRACE_IRQS_OFF
        decl %gs:pda_irqcount
        leaveq
        CFI_DEF_CFA_REGISTER    rsp
        CFI_ADJUST_CFA_OFFSET   -8
    exit_intr:
        GET_THREAD_INFO(%rcx)
        testl $3,CS-ARGOFFSET(%rsp)
        je retint_kernel

...(omit)
        GET_THREAD_INFO(%rcx)
        jmp retint_check

    #ifdef CONFIG_PREEMPT
        /* Returning to kernel space. Check if we need preemption */
        /* rcx:  threadinfo. interrupts off. */
    ENTRY(retint_kernel)
        cmpl $0,threadinfo_preempt_count(%rcx)
        jnz  retint_restore_args
        bt  $TIF_NEED_RESCHED,threadinfo_flags(%rcx)
        jnc  retint_restore_args
        bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
        jnc  retint_restore_args
        call preempt_schedule_irq
        jmp exit_intr
    #endif

        CFI_ENDPROC
    END(common_interrupt)

At the end of the ISR is a call to preempt_schedule_irq. ISR 结束时调用 preempt_schedule_irq。 and the preempt_schedule_irq is defined in kernel/sched.c as below(it calls schedule() in the middle).并且 preempt_schedule_irq 在 kernel/sched.c 中定义如下（它在中间调用 schedule() ）。

/*  
 * this is the entry point to schedule() from kernel preemption
 * off of irq context.
 * Note, that this is called and return with irqs disabled. This will
 * protect us against recursive calling from irq. 
 */ 
asmlinkage void __sched preempt_schedule_irq(void)
{   
    struct thread_info *ti = current_thread_info();
#ifdef CONFIG_PREEMPT_BKL
    struct task_struct *task = current;
    int saved_lock_depth;
#endif
    /* Catch callers which need to be fixed */
    BUG_ON(ti->preempt_count || !irqs_disabled());

need_resched:
    add_preempt_count(PREEMPT_ACTIVE);
    /*
     * We keep the big kernel semaphore locked, but we
     * clear ->lock_depth so that schedule() doesnt
     * auto-release the semaphore:
     */
#ifdef CONFIG_PREEMPT_BKL
    saved_lock_depth = task->lock_depth;
    task->lock_depth = -1; 
#endif
    local_irq_enable();
    schedule();
    local_irq_disable();
#ifdef CONFIG_PREEMPT_BKL
    task->lock_depth = saved_lock_depth;
#endif
    sub_preempt_count(PREEMPT_ACTIVE); 

    /* we could miss a preemption opportunity between schedule and now */
    barrier();
    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
        goto need_resched; 
}

So I found where the scheduling occurs, but my question is, "where in the source code does the actually context switching happen?".所以我找到了调度发生的位置，但我的问题是，“源代码中实际的上下文切换发生在哪里？”。 For context switching, the stack, mm settings, registers should be switched and the PC (program counter) should be set to the new task.对于上下文切换，应切换堆栈、mm 设置、寄存器，并将 PC（程序计数器）设置为新任务。 Where can I find the source code for that?我在哪里可以找到它的源代码？ I followed schedule() --> context_switch() --> switch_to().我遵循了 schedule() --> context_switch() --> switch_to()。 Below is the context_switch function which calls switch_to() function.(kernel/sched.c)下面是 context_switch function 调用 switch_to() function.(kernel/sched.c)

/*
 * context_switch - switch to the new MM and the new
 * thread's register state.
 */
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
           struct task_struct *next)
{
    struct mm_struct *mm, *oldmm;

    prepare_task_switch(rq, prev, next);
    mm = next->mm;
    oldmm = prev->active_mm;
    /*
     * For paravirt, this is coupled with an exit in switch_to to
     * combine the page table reload and the switch backend into
     * one hypercall.
     */
    arch_enter_lazy_cpu_mode();

    if (unlikely(!mm)) {
        next->active_mm = oldmm;
        atomic_inc(&oldmm->mm_count);
        enter_lazy_tlb(oldmm, next);
    } else
        switch_mm(oldmm, mm, next);

    if (unlikely(!prev->mm)) {
        prev->active_mm = NULL;
        rq->prev_mm = oldmm;
    }
    /*
     * Since the runqueue lock will be released by the next
     * task (which is an invalid locking op but in the case
     * of the scheduler it's an obvious special-case), so we
     * do an early lockdep release here:
     */
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
    spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif

    /* Here we just switch the register state and the stack. */
    switch_to(prev, next, prev);   // <---- this line

    barrier();
    /*
     * this_rq must be evaluated again because prev may have moved
     * CPUs since it called schedule(), thus the 'rq' on its stack
     * frame will be invalid.
     */
    finish_task_switch(this_rq(), prev);
}

The 'switch_to' is an assembly code under include/asm-x86_64/system.h. 'switch_to' 是 include/asm-x86_64/system.h 下的汇编代码。 my question is, is the processor switched to the new task inside the 'switch_to()' function?我的问题是，处理器是否切换到“switch_to()”function 内的新任务？ Then, are the codes 'barrier();然后，是代码'barrier(); finish_task_switch(this_rq(), prev);' finish_task_switch(this_rq(), prev);' run at some other time later?稍后在其他时间运行？ By the way, this was in interrupt context, so if to_switch() is just the end of this ISR, who finishes this interrupt?顺便说一句，这是在中断上下文中，所以如果 to_switch() 只是这个 ISR 的结束，谁来完成这个中断？ Or, if the finish_task_switch runs, how is CPU occupied by the new task?或者，如果finish_task_switch 运行，新任务如何占用CPU？ I would really appreciate if someone could explain and clarify things to me.如果有人可以向我解释和澄清事情，我将不胜感激。

Answer 1

Almost all of the work for a context switch is done by the normal SYSCALL/SYSRET mechanism.几乎所有的上下文切换工作都是由正常的 SYSCALL/SYSRET 机制完成的。 The process pushes its state on the stack of "current" the current running process.该进程将其 state 推送到当前正在运行的进程的“当前”堆栈上。 Calling do_sched_yield just changes the value of current, so the return just restores the state of a different task.调用 do_sched_yield 只是改变了 current 的值，所以 return 只是恢复了不同任务的 state。

Preemption gets trickier, since it doesn't happen at a normal boundary.抢占变得更加棘手，因为它不会发生在正常边界上。 The preemption code has to save and restore all of the task state, which is slow.抢占代码必须保存和恢复所有任务 state，这很慢。 That's why non-RT kernels avoid doing preemption.这就是非 RT 内核避免抢占的原因。 The arch-specific switch_to code is what saves all the prev task state and sets up the next task state so that SYSRET will run the next task correctly.特定于架构的 switch_to 代码用于保存所有上一个任务 state 并设置下一个任务 state 以便 SYSRET 正确运行下一个任务。 There are no magic jumps or anything in the code, it is just setting up the hardware for userspace.代码中没有魔术跳转或任何东西，它只是为用户空间设置硬件。

linux kernel 源中的上下文切换最终发生在哪里？

问题描述

1 个解决方案

解决方案1
0 2020-04-28 14:52:14

linux kernel 源中的上下文切换最终发生在哪里？

问题描述

1 个解决方案

解决方案1 0 2020-04-28 14:52:14

解决方案1
0 2020-04-28 14:52:14