简体   繁体   English

使用SSE操作时内存访问冲突

[英]Memory Access Violations When Using SSE Operations

I've been trying to re-implement some existing vector and matrix classes to use SSE3 commands, and I seem to be running into these "memory access violation" errors whenever I perform a series of operations on an array of vectors. 我一直在尝试重新实现一些现有的向量和矩阵类来使用SSE3命令,每当我对向量数组执行一系列操作时,我似乎遇到了这些“内存访问冲突”错误。 I'm relatively new to SSE, so I've been starting off simple. 我对SSE比较陌生,所以我一开始就很简单。 Here's the entirety of my vector class: 这是我的矢量类的全部内容:

class SSEVector3D
{
public:

   SSEVector3D();
   SSEVector3D(float x, float y, float z);

   SSEVector3D& operator+=(const SSEVector3D& rhs); //< Elementwise Addition

   float x() const;
   float y() const;
   float z() const;

private:

   float m_coords[3] __attribute__ ((aligned (16))); //< The x, y and z coordinates

};

So, not a whole lot going on yet, just some constructors, accessors, and one operation. 所以,还没有进行很多,只有一些构造函数,访问器和一个操作。 Using my (admittedly limited) knowledge of SSE, I implemented the addition operation as follows: 使用我(无可否认有限)的SSE知识,我实现了如下的加法操作:

SSEVector3D& SSEVector3D::operator+=(const SSEVector3D& rhs) 
{
   __m128 * pLhs = (__m128 *) m_coords;
   __m128 * pRhs = (__m128 *) rhs.m_coords;

   *pLhs = _mm_add_ps(*pLhs, *pRhs);

   return (*this);
}

To speed-test my new vector class against the old one (to see if it's worth re-implementing the whole thing), I created a simple program that generates a random array of SSEVector3D objects and adds them together. 为了快速测试我的新矢量类与旧的矢量类(看看它是否值得重新实现整个事物),我创建了一个简单的程序,生成SSEVector3D对象的随机数组并将它们加在一起。 Nothing too complicated: 没有什么太复杂的:

SSEVector3D sseSum(0, 0, 0);

for(i=0; i<sseVectors.size(); i++)
{
   sseSum += sseVectors[i];
}

printf("Total: %f %f %f\n", sseSum.x(), sseSum.y(), sseSum.z());

The sseVectors variable is an std::vector containing elements of type SSEVector3D , whose components are all initialized to random numbers between -1 and 1 . sseVectors变量是一个std :: vector,包含SSEVector3D类型的SSEVector3D ,其组件全部初始化为-11之间的随机数。

Here's the issue I'm having. 这是我遇到的问题。 If the size of sseVectors is 8,191 or less (a number I arrived at through a lot of trial and error), this runs fine. 如果sseVectors的大小是8,191或更少(我通过大量试验和错误得到的数字),这运行正常。 If the size is 8,192 or more, I get this error when I try to run it: 如果大小是8,192或更多,我尝试运行时会出现此错误:

signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080) signal:SIGSEGV,si_code:0(地址处的内存访问冲突:0x00000080)

However, if I comment out that print statement at the end, I get no error even if sseVectors has a size of 8,192 or more. 但是,如果我在最后注释掉print语句,即使sseVectors的大小为8,192或更多,我也不会收到任何错误。

Is there something wrong with the way I've written this vector class? 我写这个矢量类的方式有问题吗? I'm running Ubuntu 12.04.1 with GCC version 4.6 我正在使用GCC 4.6版运行Ubuntu 12.04.1

First, and foremost, don't do this 首先,最重要的是,不要这样做

__m128 * pLhs = (__m128 *) m_coords;
__m128 * pRhs = (__m128 *) rhs.m_coords;
*pLhs = _mm_add_ps(*pLhs, *pRhs);

With SSE, always do your loads and stores explicitly via the appropriate intrinsics, never by just dereferencing. 使用SSE, 始终通过适当的内在函数明确地执行加载和存储,而不是仅通过解除引用。 Instead of storing an array of 3 floats in your class, store a value of type _m128 . 不要在类中存储3个浮点数组,而是存储_m128类型的值。 That should make the compiler align instances of your class correctly, without any need for align attributes. 这应该使编译器正确地对齐您的类的实例,而不需要align属性。

Note, however, that this won't work very well with MSVC. 但请注意,这对MSVC不起作用。 MSVC seems to generally be unable to cope with alignment requirements stronger than 8-byte aligned for by-value arguments :-(. The last time I needed to port SSE code to windows, my solution was to use Intel's C++ compiler for the SSE parts instead of MSVC... MSVC似乎通常无法应对强于8字节对齐的对齐要求的值对齐参数:-(。上次我需要将SSE代码移植到Windows时,我的解决方案是使用英特尔的C ++编译器来处理SSE部件而不是MSVC ......

The trick is to notice that __m128 is 16 byte aligned. 诀窍是注意__m128是16字节对齐的。 Use _malloc_aligned() to assure that your float array is correctly aligned, then you can go ahead and cast your float to an array of __m128 . 使用_malloc_aligned()确保您的float数组正确对齐,然后您可以继续将浮点数转换为__m128数组。 Make sure also that the number of floats you allocate is divisible by four. 还要确保您分配的浮点数可以被4整除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM