简体   繁体   English

以编程方式获取缓存行大小?

[英]Programmatically get the cache line size?

All platforms welcome, please specify the platform for your answer.欢迎所有平台,请指定您回答的平台。

A similar question: How to programmatically get the CPU cache page size in C++?一个类似的问题: How to programmatically get the CPU cache page size in C++?

On Linux (with a reasonably recent kernel), you can get this information out of /sys:在 Linux(具有相当新的内核)上,您可以从 /sys 中获取此信息:

/sys/devices/system/cpu/cpu0/cache/

This directory has a subdirectory for each level of cache.该目录对每一级缓存都有一个子目录。 Each of those directories contains the following files:这些目录中的每一个都包含以下文件:

coherency_line_size
level
number_of_sets
physical_line_partition
shared_cpu_list
shared_cpu_map
size
type
ways_of_associativity

This gives you more information about the cache then you'd ever hope to know, including the cacheline size ( coherency_line_size ) as well as what CPUs share this cache.这为您提供了有关缓存的更多信息,然后您希望知道,包括缓存线大小 ( coherency_line_size ) 以及哪些 CPU 共享此缓存。 This is very useful if you are doing multithreaded programming with shared data (you'll get better results if the threads sharing data are also sharing a cache).如果您正在使用共享数据进行多线程编程,这非常有用(如果共享数据的线程也共享缓存,您将获得更好的结果)。

On Linux look at sysconf(3).在 Linux 上查看 sysconf(3)。

sysconf (_SC_LEVEL1_DCACHE_LINESIZE)

You can also get it from the command line using getconf:您也可以使用 getconf 从命令行获取它:

$ getconf LEVEL1_DCACHE_LINESIZE
64

I have been working on some cache line stuff and needed to write a cross-platform function.我一直在研究一些缓存线的东西,需要编写一个跨平台的函数。 I committed it to a github repo at https://github.com/NickStrupat/CacheLineSize , or you can just use the source below.我将它提交给https://github.com/NickStrupat/CacheLineSize的 github 存储库,或者您可以使用下面的源代码。 Feel free to do whatever you want with it.随意用它做任何你想做的事。

#ifndef GET_CACHE_LINE_SIZE_H_INCLUDED
#define GET_CACHE_LINE_SIZE_H_INCLUDED

// Author: Nick Strupat
// Date: October 29, 2010
// Returns the cache line size (in bytes) of the processor, or 0 on failure

#include <stddef.h>
size_t cache_line_size();

#if defined(__APPLE__)

#include <sys/sysctl.h>
size_t cache_line_size() {
    size_t line_size = 0;
    size_t sizeof_line_size = sizeof(line_size);
    sysctlbyname("hw.cachelinesize", &line_size, &sizeof_line_size, 0, 0);
    return line_size;
}

#elif defined(_WIN32)

#include <stdlib.h>
#include <windows.h>
size_t cache_line_size() {
    size_t line_size = 0;
    DWORD buffer_size = 0;
    DWORD i = 0;
    SYSTEM_LOGICAL_PROCESSOR_INFORMATION * buffer = 0;

    GetLogicalProcessorInformation(0, &buffer_size);
    buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION *)malloc(buffer_size);
    GetLogicalProcessorInformation(&buffer[0], &buffer_size);

    for (i = 0; i != buffer_size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
        if (buffer[i].Relationship == RelationCache && buffer[i].Cache.Level == 1) {
            line_size = buffer[i].Cache.LineSize;
            break;
        }
    }

    free(buffer);
    return line_size;
}

#elif defined(linux)

#include <stdio.h>
size_t cache_line_size() {
    FILE * p = 0;
    p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
    unsigned int i = 0;
    if (p) {
        fscanf(p, "%d", &i);
        fclose(p);
    }
    return i;
}

#else
#error Unrecognized platform
#endif

#endif

On x86, you can use the CPUID instruction with function 2 to determine various properties of the cache and the TLB.在 x86 上,您可以使用带有函数 2 的CPUID指令来确定缓存和 TLB 的各种属性。 Parsing the output of function 2 is somewhat complicated, so I'll refer you to section 3.1.3 of the Intel Processor Identification and the CPUID Instruction (PDF).解析函数 2 的输出有些复杂,所以我将向您介绍Intel 处理器标识和 CPUID 指令 (PDF) 的第 3.1.3 节。

To get this data from C/C++ code, you'll need to use inline assembly, compiler intrinsics, or call an external assembly function to perform the CPUID instruction.要从 C/C++ 代码中获取此数据,您需要使用内联汇编、编译器内在函数或调用外部汇编函数来执行 CPUID 指令。

If you're using SDL2 you can use this function:如果您使用的是 SDL2,则可以使用此功能:

int SDL_GetCPUCacheLineSize(void);

Which returns the size of the L1 cache line size, in bytes.它返回 L1 缓存行大小的大小,以字节为单位。

In my x86_64 machine, running this code snippet:在我的 x86_64 机器上,运行以下代码片段:

printf("CacheLineSize = %d",SDL_GetCPUCacheLineSize());

Produces CacheLineSize = 64产生CacheLineSize = 64

I know I'm a little late, but just adding information for future visitors.我知道我有点晚了,但只是为未来的访客添加信息。 The SDL documentation currently says the number returned is in KB, but it is actually in bytes. SDL 文档目前说返回的数字以 KB 为单位,但实际上以字节为单位。

On the Windows platform:在 Windows 平台上:

from http://blogs.msdn.com/oldnewthing/archive/2009/12/08/9933836.aspx来自http://blogs.msdn.com/oldnewthing/archive/2009/12/08/9933836.aspx

The GetLogicalProcessorInformation function will give you characteristics of the logical processors in use by the system. GetLogicalProcessorInformation 函数将为您提供系统使用的逻辑处理器的特征。 You can walk the SYSTEM_LOGICAL_PROCESSOR_INFORMATION returned by the function looking for entries of type RelationCache.您可以遍历函数返回的 SYSTEM_LOGICAL_PROCESSOR_INFORMATION,查找类型为 RelationCache 的条目。 Each such entry contains a ProcessorMask which tells you which processor(s) the entry applies to, and in the CACHE_DESCRIPTOR, it tells you what type of cache is being described and how big the cache line is for that cache.每个这样的条目都包含一个 ProcessorMask,它告诉您该条目适用于哪个处理器,并且在 CACHE_DESCRIPTOR 中,它告诉您正在描述什么类型的缓存以及该缓存的缓存行有多大。

ARMv6 and above has C0 or the Cache Type Register. ARMv6及以上有C0或Cache Type Register。 However, its only available in privileged mode.但是,它仅在特权模式下可用。

For example, from Cortex™-A8 Technical Reference Manual :例如,来自Cortex™-A8 技术参考手册

The purpose of the Cache Type Register is to determine the instruction and data cache minimum line length in bytes to enable a range of addresses to be invalidated.高速缓存类型寄存器的目的是确定指令和数据高速缓存的最小行长度(以字节为单位),以使地址范围无效。

The Cache Type Register is:缓存类型寄存器是:

  • a read-only register只读寄存器
  • accessible in privileged modes only.只能在特权模式下访问。

The contents of the Cache Type Register depend on the specific implementation. Cache Type Register 的内容取决于具体的实现。 Figure 3-2 shows the bit arrangement of the Cache Type Register...图 3-2 显示了缓存类型寄存器的位排列...


Don't assume the ARM processor has a cache (apparently, some can be configured without one).不要假设 ARM 处理器有缓存(显然,有些可以没有缓存)。 The standard way to determine it is via C0 .确定它的标准方法是通过C0 From the ARM ARM , page B6-6:ARM ARM ,第 B6-6 页:

From ARMv6, the System Control Coprocessor Cache Type register is the mandated method to define the L1 caches, see Cache Type register on page B6-14.从 ARMv6 开始,系统控制协处理器缓存类型寄存器是定义 L1 缓存的强制方法,请参阅第 B6-14 页的缓存类型寄存器。 It is also the recommended method for earlier variants of the architecture.对于早期的架构变体,这也是推荐的方法。 In addition, Considerations for additional levels of cache on page B6-12 describes architecture guidelines for level 2 cache support.此外,第 B6-12 页上的额外缓存级别的注意事项描述了 2 级缓存支持的体系结构指南。

You can use std::hardware_destructive_interference_size since C++17.从 C++17 开始,您可以使用std::hardware_破坏性_interference_size
Its defined as:其定义为:

Minimum offset between two objects to avoid false sharing.两个对象之间的最小偏移量,以避免错误共享。 Guaranteed to be at least alignof(std::max_align_t)保证至少是 alignof(std::max_align_t)

You can also try to do it programmatically by measuring some timing.您也可以尝试通过测量一些时间以编程方式进行。 Obviously, it won't always be as precise as cpuid and the likes, but it is more portable.显然,它并不总是像 cpuid 之类的那样精确,但它更便携。 ATLAS does it at its configuration stage, you may want to look at it: ATLAS 在其配置阶段执行此操作,您可能需要查看它:

http://math-atlas.sourceforge.net/ http://math-atlas.sourceforge.net/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM