简体   繁体   中英

Is it possible to use VMX CPU instructions inside VM?

Is it possible that a Process inside a VM guest uses the VMX (AMD-V, VT-x) CPU instructions, that are then processed by the outer VMM instead of directly on the CPU?

Edit: Assume that the outer VM uses VMX itself to manage its virtual guest machine (ie it runs in Ring -1).

If it is possible are there any implementations of VMMs that support emulating/intercepting VMX calls (VMware, Parallels, KVM,...)?

Nor the Intel's VT-x nor the AMD's AMD-V support a fully recursive virtualization in hardware - where the CPU keep a hierarchy of nested virtualized environments in the same fashion of a call / ret pair.

A logical processor only supports two modes of operation: the host mode (called VMX root mode in Intel terminology, hypervisor in AMD's one) and the guest mode (called as such in AMD's manuals and VMX non-root mode in Intel's ones).
This implies a flattened hierarchy where every virtualized environment is treated the same by the CPU - the CPU is unaware of how many levels the hierarchy of VMs is deep.

An attempt to use the virtualization instructions them-selves inside a guest will yield control to the monitor (VMM).
But some support for accelerating frequently used virtual instructions has appeared recently making nested VM possible.

I'll try to analyse the issues to face to implement a nested virtualization.
I'm not dealing with the whole thing - I'm considering the base case only leaving out all the part dealing with the virtualization of the hardware; a part that itself is as problematic as the virtualization of the software.

Note
I'm not an expert on virtualization technology and have no experience on it at all - corrections are welcome.
The purpose of this answer is to make the reader conceptually believe that nested virtualization is possible and outline the problems to face.

VT-x

A logical processor enters the VMX operation by executing vmxon - as soon as the mode is entered the processor is in root mode.
Root mode is the mode of the VMM, it can launch, resume and handle the VMs.

The VMM then set the current VMCS (VM Control Structure) with vmptrld - the VMCS contains all the metadata necessary to virtualise a guest.
The VMCS is read and written not with direct memory accesses but with vmread and vmwrite instructions.

Finally, the VMM executes vmlaunch to start executing the guest.

Incepting a VM

Now the logical processor is executing in a virtualized environment.
Suppose the guest is a VMM itself and let's call this the non-root VMM - it needs to repeat the steps above.

But Intel is clear in its manuals (Manual 3 - Chapter 25.1.2):

The following instructions cause VM exits when they are executed in VMX non-root operation:
[...]
This is also true of instructions introduced with VMX, which include:
[...], VMLAUNCH , VMPTRLD , [...] and VMXON

vmxon this instruction causes a VM Exit, the root VMM resume from the instruction after its last vmlaunch , can inspect the VMCS for the reason of the exit and take appropriate action.
I'm not a seasoned VMM writer so I'm not sure what the root VMM have to do exactly to emulate this instruction - since executing a vmxon in VMX root mode will fail and doing a vmxoff followed by a vmxon with VM Region given by the non-root VMM seems a security vulnerability (or a lead to it) I believe that all the root VMM has to do is record that the guest is now in "VMX root mode".
The quotes are necessary here: this mode exists only in software when the root VMM will handle the control back to the non-root VMM the CPU will be in non-root VMX mode.

After that, the non-root VMM will attempt to use vmptrld to set the current VMCS.
vmptrld will induce a VM exit and the root VMM is in control once again - if the CPU doesn't support VMCS shadowing the root VMM has to record that the pointer given by the non-root VMM is now the current VMCS - if the CPU does support VMCS shadowing the VMM set the VMCS link pointer field of its VMCS (the one used to virtualise the non-root VMM) to the VMCS given by the non-root VMM.
One way or another the VMM knows which virtualised VMCS is active.

vmread and vmwrite executed by the non-root VMM will or will not cause a VM exit.
If VMCS shadowing is active the CPU won't do a VM Exit and instead will read the VMCS pointed by the VMCS link pointer in the active VMCS (called the shadow VMCS ).
This will speed up virtualization of nested VMs.
If VMCS shadowing is not active the CPU will VM exit and the root VMM has to emulate the read/write.

Finally, the non-root VMM will launch its VM - this is a nested VM.
vmlaunch will trigger a VM Exit.
The root VMM has to do a few things:

  • Save its VMCS somewhere.
  • Merge the current VMCS and the non-root VMM VMCS - Since the VMCS controls, for example, what events cause a VM Exit the merged one must be the union of the two in this regard.
  • Load the merged VMCS as the CPU's current one
  • Do a vmlaunch / vmresume .

Inside the dream

Now the CPU is executing the nested VM (a VVM - Virtual VM?).
What happens when a sensitive instruction or an event causes a VM Exit?

From the processor point of view, there are only two levels of virtualization: the root VMX mode and the non-root VMX mode.
Since the guest is in non-root VMX mode, control is transferred back to the root VMX mode code - ie the root VMM.

The root VMM now must understand if that event is from its VM or from its VM's VM.
This can be done by tracking the use of vmlaunch / vmresume and checking the bits in the VMCS.

If the VM Exit is directed to the non-root VMM the root VMM has to load its original VMCS, eventually set in it the link the non-root VMM, update the non-root VMM VMCS status bits and do a vmresume .
If the VM Exit is directed to it, the root VMM will handle it as any other VM Exit.

A dream within a dream within a dream

What if we want to create a VM inside the nested VM? Kind of a Virtual Virtual VM (VVVM).

There are two things to notice:

  1. The root VMM is still the one invoked during every VM Exit.
    Even if the VVVM is three levels deep it is not the non-root-non-root VMM the first and/or the only manager used to virtualise it.
    From a security point of view, the root VMM is the weak link.
  2. The hardware doesn't really support arbitrary deep nesting.
    A VMM may not need too much effort to go from supporting 1-level of nesting to n-levels of nesting (again I'm not seasoned here) but special support as outlined above is still needed.
    It is not as easy as launch the VM and everything else will be taken care by the CPU.

AMD-v

There is no root vs non-root mode in AMD-v, the CPU starts executing a VM with vmrun that takes a pointer to a VMCB (VM Control Block) that serves the same purpose of the Intel's VMCS.
Upon a vmrun the CPU is in guest mode.

The VMCB is cached but it can only be read with usual memory accesses.
The vmload / vmsave instructions explicitly load into and save from the cache the VMCB fields subject to caching.

This interface is easier than Intel's one but it is as powerful - even when it comes to nesting virtualization.

Assume we are inside a VM and the code executes a vmrun - thus we are virtualizing a VMM.

Technically a VMM can choose whenever vmrun will or will not trigger a VM Exit.
Practically, however, AMD-v currently require the former to always be the case:

The following conditions are considered illegal state combinations: [...]
* The VMRUN intercept bit is clear

Thus the root VMM (I'll use the same terminology as in the Intel case) will gain control and has to emulate a vmrun (since the hardware only support a single level of virtualisation).

The root VMM can save and merge the current VMCB with the non-root VMM VMCB and go ahead with the vmrun as in the Intel case.

Upon an exit the root-VMM has to determine if the exit was directed to it or to the non-root VMM, again this can be done tracking the vmrun and the control bits in the VMCB.

Dreaming again

We have set up a VM inside a VM relatively easy - now what happens upon a VM Exit?
The root VMM receives the exit and if directed to the non-root VMM is has to restore its original VMCB and resume the run (ie use vmrun with its original VMCB).

AMD-v supports a fast virtualisation of the vmsave and vmload instructions by considering their addresses guest addresses and thus subject to the usual page-nesting virtualisation.

Bringing up Inception again

As with the Intel case, the virtualization can be nested again as long as the VMM support that features.

The critical security warning noted for the Intel's case is valid for the AMD's one as well.


Due to its implementation-defined format and the fact the memory area can be used just as a spill area that is not updated in real time

After reading up a lot more on virtualization I stumbled onto this ticket at virtualbox.

It is a feature request for exactly this functionality and reading through the comments it seems that VMware Workstation has already implemented it, so it must indeed be working.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM