简体繁体 English

一致性和性能

[英]Alignment and performance

原文 2012-01-31 10:19:53 2 2 c++/ c/ linux/ sse/ libc

Routines strcmp for comparing char * and memcmp for everything else, do they run faster on memory block (on x86_64) which is somehow aligned (how?)? 例程strcmp用于比较char *和memcmp的其他所有内容，它们是否在以某种方式对齐的内存块（在x86_64上）上运行得更快（如何？）？ Does libc use SSE for this routines? libc是否在此例程中使用SSE ？

2 个解决方案

It depends, but on architectures where alignment matters or where SIMD instructions are available, typically the routines will operate on leading bytes, then do as many wide aligned operations as the data allows, then operate on trailing bytes. 这取决于对齐方式或SIMD指令可用的体系结构，通常，例程将对前导字节进行操作，然后执行数据允许的尽可能多的宽对齐操作，然后对尾随字节进行操作。

Whether the leading and trailing bytes are contributing significantly to the processing time for your data can be determined by experiment. 可以通过实验确定前导字节和尾随字节是否对数据的处理时间有重大影响。

If you worry about performance for comparison, you should take a look at well-known Boyer-Moore alogrithm and this post from GNU Grep author, Mike Haertel. 如果你担心性能进行比较，你应该看一看著名博耶-穆尔alogrithm和这个职位从GNU grep的作者，麦克Haertel。

He explains how one can manage to be really fast about searching something in a data block. 他解释了如何在搜索数据块中的内容时能很快地做到真正。

His summary is quite clear about what to do : 他的摘要很清楚该怎么办：

Use Boyer-Moore (and unroll its inner loop a few times). 使用Boyer-Moore（并展开其内部循环几次）。

Roll your own unbuffered input using raw system calls. 使用原始系统调用滚动您自己的无缓冲输入。 Avoid copying the input bytes before searching them. 避免在搜索之前复制输入字节。 (Do, however, use buffered output . The normal grep scenario is that the amount of output is small compared to the amount of input, so the overhead of output buffer copying is small, while savings due to avoiding many small unbuffered writes can be large.) （但是，请使用缓冲输出。通常的grep方案是输出量比输入量小，因此输出缓冲区复制的开销很小，而由于避免了许多小的无缓冲写操作而节省的空间可能很大。）

Don't look for newlines in the input until after you've found a match. 在找到匹配项之前，不要在输入中查找换行符。

Try to set things up (page-aligned buffers, page-sized read chunks, optionally use mmap) so the kernel can ALSO avoid copying the bytes. 尝试进行设置（页面对齐的缓冲区，页面大小的读取块，可以选择使用mmap），以便内核也可以避免复制字节。