可能的缓冲区溢出问题

Question

I have the following code written in C++ to extract a given range of text in a Piece Table data structure. 我有以下用C ++编写的代码，用于提取Piece Table数据结构中给定范围的文本。 Here is the function of class PieceTable that stores the given range of text in the character array buffer : 这是PieceTable类的功能，该功能将给定范围的文本存储在字符数组buffer ：

void PieceTable::getTextInRange(unsigned __int64 startPos, unsigned __int64 endPos, char buffer[]){

    char* totalBuffer = new char[getSize() + 2];

    getBuffer(totalBuffer);

    if(endPos >= getSize())
        endPos = getSize() - 1; 

    cout<<"startPos : "<<startPos<<endl;
    cout<<"endPos : "<<endPos<<endl;

    memcpy(buffer, &totalBuffer[startPos], endPos - startPos + 1);

    buffer[endPos - startPos + 2] = '\0';

    if(totalBuffer != 0)
        delete[] totalBuffer;
    totalBuffer = 0;
}

Here is the piece of code in the main method which i use to test this code : 这是我用来测试此代码的主要方法中的一段代码：

temp2 = new char[end - start + 2];  //changing 2 to 3 solves the problem
pieceTable.getTextInRange(Start, end, temp2);
for(int i = 0; i< end - start + 1; i++)
   cout<<temp2[i];
cout<<endl;

if( temp2 != 0)
{
  delete[] temp2;   //this line causes the heap corruption error
  temp2 = 0;
}

Declaration of temp2 : char* temp2; temp2声明： char* temp2;

Whenever the program encounters the delete[] temp2 statement, there is a heap corruption error. 每当程序遇到delete[] temp2语句时，就会出现堆损坏错误。 The problem does not occur if I allocate memory for temp2 as: 如果我为temp2分配内存，则不会发生此问题：
temp2 = new char[end - start + 3] So, basically changing the length solves the problem. temp2 = new char[end - start + 3]因此，基本上改变长度可以解决问题。 I know that I am messing up with the lengths somewhere, but I can't figure out where. 我知道我正在弄乱某个地方的长度，但我不知道在哪里。

EDIT : getSize() : 编辑：getSize（）：

__int64 PieceTable::getSize()
{
    return dList.getLength(dList.getBack());
}

I am using a piece table data structure. 我正在使用一个表数据结构。 Here it is, inside this paper:http://www.cs.unm.edu/~crowley/papers/sds.pdf 就在本文内部：http：//www.cs.unm.edu/~crowley/papers/sds.pdf

I may be wrong, but I don't think that there is any problem with getSize() , since the function I use to retrieve the length of the entire buffer getBuffer , works as shown in the code. 我可能是错的，但我认为getSize()不会有任何问题，因为我用来检索整个缓冲区getBuffer的长度的函数按代码所示工作。

Answer 1

In PieceTable::getTextInRange , you have this line: 在PieceTable::getTextInRange ，您具有以下这一行：

buffer[endPos - startPos + 2] = '\0';

and when you allocate the thing that you pass in as buffer you allocate like this: 当您分配传入的内容作为buffer您将像这样进行分配：

temp2 = new char[end - start + 2];

Lets put in some real numbers... 让我们输入一些实数...

buffer[5 - 2 + 2] = '\0';

temp2 = new char[5 - 2 + 2];

which is equivalent to: 等效于：

buffer[5] = '\0';

temp2 = new char[5];

Well, there's your problem. 好吧，那是你的问题。 If you do new char [5] you get an array that has valid indexes from 0 through 4. 5 is not a valid index into this array. 如果您使用new char [5]则会得到一个数组，该数组的有效索引为0到4。5对该数组无效。

Might I suggest that you make it a rule that you only break in the most of extenuating of circumstances that you always specify ranges in terms of [begin, end) like the STL does. 可能我建议您制定一条规则，即仅在大多数情况下才中断，而总是像STL一样在[begin，end）范围内指定范围。 This means you specify one past the last desired index for end. 这意味着您指定了最后一个所需的结束索引之后的一个。 This makes range calculation math much less error prone. 这使范围计算数学更容易出错。 Also, the consistency of the interface with the way STL works makes it easier to work with. 此外，接口与STL工作方式的一致性也使使用起来更容易。 For example, calculating the size of the range is always end - begin with this scheme. 例如，计算范围的大小总是end - begin从此方案end - begin 。

There is an old (circa 1982) paper by EW Dijkstra that gives some good reasons why this scheme for expressing ranges is the best one . EW Dijkstra发表了一篇古老的论文（大约在1982年），这给出了一些很好的理由，为什么这种表示范围的方案是最好的方案。

Answer 2

The reason changing the 2 to a 3 in the code: 将代码中的2更改为3的原因：

temp2 = new char[end - start + 2];

works is because otherwise you'll write past the end of the buffer in getTextInRange (you're off by one). 起作用的原因是，否则，您将在getTextInRange中的缓冲区末尾写（您离开了一个）。

You're end and start above correspond to the arguments endPos and startPos in getTextInRange , and in getTextInRange you have: 你end和start上述对应的参数endPos和startPos在getTextInRange ，并在getTextInRange您有：

buffer[endPos - startPos + 2] = '\0';

The range of your array is [0, endPos - startPos + 2) ; 您的数组范围是[0, endPos - startPos + 2) ; therefore the element at position endPos - startPos + 2 is 1 past the end of your array. 因此，位置endPos - startPos + 2处的元素比数组的末尾endPos - startPos + 2 1。 Overwriting this value is causing the heap to become corrupted. 覆盖此值将导致堆损坏。

Answer 3

It is clear from your code that the last index which you're using in getTextInRange is this: 从代码中可以清楚地看出，您在getTextInRange使用的最后一个索引是：

endPos-startPos+2 //last index

which pretty much explains why you need to allocate memory minimum of size this: 这几乎解释了为什么您需要为此分配最小的内存：

endPos-startPos+3 //number of objects : memory allocation

That is, if you allocate memory for N objects, the last object in the array can be accessed with the index N-1 which is also the maximum index for the array. 也就是说，如果为N对象分配内存，则可以使用索引N-1 （也是该数组的最大索引）访问数组中的最后一个对象。 The index N falls out of the range. 索引N超出范围。 Recall that the index stars with 0 , so it has to end at N-1 , not at N . 回想一下，索引以0 ，因此它必须以N-1结尾，而不是N

可能的缓冲区溢出问题

问题描述

3 个解决方案

解决方案1
7 已采纳 2011-12-09 05:43:58

解决方案2
2 2011-12-09 05:44:13

解决方案3
1 2011-12-09 05:46:26

可能的缓冲区溢出问题

问题描述

3 个解决方案

解决方案1 7 已采纳 2011-12-09 05:43:58

解决方案2 2 2011-12-09 05:44:13

解决方案3 1 2011-12-09 05:46:26

解决方案1
7 已采纳 2011-12-09 05:43:58

解决方案2
2 2011-12-09 05:44:13

解决方案3
1 2011-12-09 05:46:26