简体   繁体   中英

why does perf stat show “stalled-cycles-backend” as <not supported>?

Running perf stat ls shows this:

Performance counter stats for 'ls':

          1.388670 task-clock                #    0.067 CPUs utilized          
                 2 context-switches          #    0.001 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               266 page-faults               #    0.192 M/sec                  
           3515391 cycles                    #    2.531 GHz                    
           2096636 stalled-cycles-frontend   #   59.64% frontend cycles idle   
   <not supported> stalled-cycles-backend  
           2927468 instructions              #    0.83  insns per cycle        
                                             #    0.72  stalled cycles per insn
            615636 branches                  #  443.328 M/sec                  
             22172 branch-misses             #    3.60% of all branches        

       0.020657192 seconds time elapsed

Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware, kernel or user-space software do I need to see this value?

Currently tried this on RHEL with Linux 3.12 for x86_64, with matching "perf" version, on different Intel Core i5 and i7 systems (Ivy Bridge type). None of them support stalled-cycles-backend.

Some more info:

$ perf list | grep stalled
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]

$ ls /sys/devices/cpu/events/
branch-instructions  bus-cycles    cache-references  instructions  mem-stores
branch-misses        cache-misses  cpu-cycles        mem-loads     stalled-cycles-frontend

$ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-frontend
event=0x0e,umask=0x01,inv,cmask=0x01

Edit: just tried this on an AMD Phenom II X6 1045T CPU, under Ubuntu 12.04 with Linux 3.2 (32bit) - and here it does show values for both stalled-cycles-frontend and stalled-cycles-backend.

Looks like perf has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.

We start with

perf list --help

...shows the following NOTE

    1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual
       Volume 3B: System Programming Guide
       http://www.intel.com/Assets/PDF/manual/253669.pdf

...armed with that URL you end up in

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

...you want section 19.3

19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.

...so for architectural events you need Table 19-1

19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.

**Table 19-1. Architectural Performance Events

在此输入图像描述

在此输入图像描述

... now comes the tricky part, you take the UMask Value as the upper 2 hex digits and the Event Num is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat .

perf stat --help
  -e, --event= Select the PMU event. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor. 

... it says NNN but you can give it NNNN . Let's verify that this works, let's ask perf stat for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date command for simplicity.

$ perf stat -e r412e -e cache-misses date

Fri Mar 28 09:28:52 CDT 2014

Performance counter stats for 'date':

          2292 r412e                                                       
          2292 cache-misses                                                

   0.003322663 seconds time elapsed

$ 

As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:

在此输入图像描述

The perf (or its in-kernel part) was not updated to support your CPU, so perf is unable to map generic event name "stalled-cycles-backend" to actual HW event.

In such case it can be easier to find event names; eg for Intel CPUs - from Intel's optimization manual http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (which groups events by type and explains how to use them to measure various parts). Don't have similar document for AMD.

To use event names with perf without manual conversion into raw event ids (like amdn says in his answer ), you can use converter scripts showevtinfo and check_events from perfmon2 ( libpfm4 ; examples folder), as explained in the article " How to monitor the full range of CPU performance events " by Bojan Nikolic http://www.bnikolic.co.uk/blog/hpc-prof-events.html . perfmon2 knows AMD and Intel CPUs, and written in C/C++

For Intel CPUs the easiest way is to use ocperf wrapper over perf from Intel's open source python project by Andi Kleen "pmu-tools" hosted at github https://github.com/andikleen/pmu-tools and introduced here in ML: https://lwn.net/Articles/556983/ and in Andi's blog http://halobates.de/blog/p/245

The ocperf understands all intel event names from Intel's optimization manual.

ocperf will also support every HW event with older linux kernels. It has its own database in tsv or json format with all HW events and their codes at https://download.01.org/perfmon/ (there is auto-downloader in pmu-tools), and the database is constantly updated by Intel's employers. Format of database is documented in readme: https://download.01.org/perfmon/readme.txt

For Sandy Bridge/Ivy Bridge or Haswell, and kernels 3.10 or newer, you can also use toplev.py script from "pmu-tools" to investigate performance. Here is description from its author, Andi Kleen, http://halobates.de/blog/p/262 " pmu-tools, part II: toplev " based on "TopDown" method from Ahmad Yasin " How to Tune Applications Using a Top-Down Characterization of Microarchitectural Issues and " Top Down Analysis. Never lost with performance counters "

Just found Re: perf, x86: Add parts of the remaining haswell PMU functionality :

> AFAICS backend stall cycles are documented to work on Ivy Bridge.

I'm not aware of any documentation that presents these events
as accurate frontend/backend stalls without using the full
TopDown methology (Optimization manual B.3.2)

So IIUC stalled-cycles-backend counters are too unreliable on Ivy Bridge, and that's why the kernel devs have decided to not support them.

And sure enough, Linux' perf_event_intel.c supports PERF_COUNT_HW_STALLED_CYCLES_BACKEND for Nehalem, Xeon E7 and SandyBridge, but not for IvyBridge. PERF_COUNT_HW_STALLED_CYCLES_FRONTEND is supported for IvyBridge, though.

So I guess there won't be a way to get this counter on my current CPU - either switch CPUs or use the full top-down methodology mentioned in the mail (and described here and here )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM