QEMU-KVM 性能优化

Question

我正在尝试测试 QEMU-KVM 和主机之间的性能差异，但无法理解如何优化我的 QEMU-KVM 以实现接近本机的性能。

我在 lubuntu 14.04 上安装了 QEMU-KVM，并运行了一个不会产生任何缓存未命中的压力微基准测试。

我正在使用性能监控工具记录性能计数器、停用指令。

由于 QEMU 不提供此性能计数器。 我正在使用来自主机系统的 perf 记录整个 QEMU 进程的性能。

获得的结果并未反映这一点。 我不确定我应该如何设置 QEMU-KVM 子系统。

我在下面描述 QEMU 和主机（裸机）的详细信息。

QEMU 模拟器版本 2.0.0 结合 KVM 作为虚拟环境和 libvirt 1.2.2

来宾机器运行内核版本3.19.0-15-generic ，主机运行版本3.14.5-031405-generic在 x86_64 机器上

guest machine with Intel SandyBridge processor (model name:Intel Xeon E312xx) with the following flags: sockets=1,cores=1,threads=1 and 4mb cache.
More details:
cpu family    : 6
model        : 42
max freq        : 2394.560 MHz
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm vnmi ept xsaveopt


The host machine is an Intel Sandy Bridge processor (Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz) with 4 cores and 6mb cache.
cpu family    : 6
model        : 42
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

如果我能提供更多详细信息，请告诉我。

谢谢！

Answer 1

尝试使用事件修饰符，如“性能帮助列表”中所述：

EVENT MODIFIERS
       Events can optionally have a modifer by appending a colon and one or more
       modifiers. Modifiers allow the user to restrict the events to be counted. The
       following modifiers exist:

           u - user-space counting
           k - kernel counting
           h - hypervisor counting
           G - guest counting (in KVM guests)
           H - host counting (not in KVM guests)
           p - precise level
           S - read sample value (PERF_SAMPLE_READ)
           D - pin the event to the PMU

恕我直言，上面的文档并不令人满意，我也不完全理解每个事件修饰符的含义。 有时我会得到不一致的结果，例如在用户空间计数（“u”修饰符）中比完全计数（根本没有修饰符）更多的退休指令。

我正在使用以下 perf 版本：

>> perf --version
perf version 3.13.11-ckt20

QEMU-KVM 性能优化

问题描述

1 个解决方案

解决方案1
0 2015-07-11 09:52:35

QEMU-KVM 性​​能优化

问题描述

1 个解决方案

解决方案1 0 2015-07-11 09:52:35

QEMU-KVM 性能优化

解决方案1
0 2015-07-11 09:52:35