简体   繁体   English

一致性和性能

[英]Alignment and performance

Routines strcmp for comparing char * and memcmp for everything else, do they run faster on memory block (on x86_64) which is somehow aligned (how?)? 例程strcmp用于比较char *memcmp的其他所有内容,它们是否在以某种方式对齐的内存块(在x86_64上)上运行得更快(如何?)? Does libc use SSE for this routines? libc是否在此例程中使用SSE

It depends, but on architectures where alignment matters or where SIMD instructions are available, typically the routines will operate on leading bytes, then do as many wide aligned operations as the data allows, then operate on trailing bytes. 这取决于对齐方式或SIMD指令可用的体系结构,通常,例程将对前导字节进行操作,然后执行数据允许的尽可能多的宽对齐操作,然后对尾随字节进行操作。

Whether the leading and trailing bytes are contributing significantly to the processing time for your data can be determined by experiment. 可以通过实验确定前导字节和尾随字节是否对数据的处理时间有重大影响。

If you worry about performance for comparison, you should take a look at well-known Boyer-Moore alogrithm and this post from GNU Grep author, Mike Haertel. 如果你担心性能进行比较,你应该看一看著名博耶-穆尔alogrithm这个职位从GNU grep的作者,麦克Haertel。

He explains how one can manage to be really fast about searching something in a data block. 他解释了如何在搜索数据块中的内容时能很快地做到真正。

His summary is quite clear about what to do : 他的摘要很清楚该怎么办:

  • Use Boyer-Moore (and unroll its inner loop a few times). 使用Boyer-Moore(并展开其内部循环几次)。
  • Roll your own unbuffered input using raw system calls. 使用原始系统调用滚动您自己的无缓冲输入。 Avoid copying the input bytes before searching them. 避免在搜索之前复制输入字节。 (Do, however, use buffered output . The normal grep scenario is that the amount of output is small compared to the amount of input, so the overhead of output buffer copying is small, while savings due to avoiding many small unbuffered writes can be large.) (但是,请使用缓冲输出 。通常的grep方案是输出量比输入量小,因此输出缓冲区复制的开销很小,而由于避免了许多小的无缓冲写操作而节省的空间可能很大。 )
  • Don't look for newlines in the input until after you've found a match. 在找到匹配项之前,不要在输入中查找换行符。
  • Try to set things up (page-aligned buffers, page-sized read chunks, optionally use mmap) so the kernel can ALSO avoid copying the bytes. 尝试进行设置(页面对齐的缓冲区,页面大小的读取块,可以选择使用mmap),以便内核也可以避免复制字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 代码 alignment 显着影响性能 - Code alignment dramatically affects performance 数据的mmap和页面对齐-这会提高性能吗? - mmap and page alignment of data - does this increase performance? 对齐数据成员和成员函数以提高性能 - Alignment of data members and member functions for performance 为什么增加阵列对齐会降低性能? - Why increase of array alignment degrades performance? 对齐对C ++ 11的性能是否真的重要? - Does alignment really matter for performance in C++11? 32位代码中DWORD与QWORD对齐的性能 - Performance of DWORD vs QWORD alignment in 32 bit code VC中哪种大小的结构成员对齐可带来性能优势? - which size of struct member alignment in VC bring performance benefit? 内存对齐优化不仅性能而且还有内存大小 - Memory alignment optimization of not only performance but also the memory size C ++:比较指向char数组中不同位置的指针的性能(尝试学习对齐) - C++ : comparing performance of pointers pointing at different locations in a char array (trying to learn alignment) 隐藏在C ++类的嵌入式char数组中的数据成员的性能,安全性和对齐方式是什么? - What is the Performance, Safety, and Alignment of a Data member hidden in an embedded char array in a C++ Class?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM