简体   繁体   English

环形缓冲区的有效边界检查

[英]effective bounds checking on ring buffer

as some might have noticed I'm trying to implement a ring buffer. 正如某些人可能已经注意到的,我正在尝试实现环形缓冲区。 I want to have a certain amount of safety measures in the data structure while not losing too much efficiency. 我想在数据结构中有一定数量的安全措施,同时又不致失去太多效率。

My current solution implements two types of counters, an index, which is a direct offset into the buffers memory and a sequence number, which is just a counter of type size_t . 我当前的解决方案实现了两种类型的计数器,即索引(它是直接进入缓冲区内存的偏移量)和序列号,它只是类型为size_t的计数器。 The sequence number is used by iterators to access the ring buffer. 迭代器使用序列号访问环形缓冲区。 Therefore the ring buffer has to convert from sequence number to buffer index on every buffer access. 因此,环形缓冲区必须在每次访问缓冲区时从序列号转换为缓冲区索引。 This is usually rather efficient: 通常这是相当有效的:

size_t offset = seqNum - m_tailSeq;
size_t index = (m_tailIdx + offset) % m_size;

where seqNum is the sequence number to be converted, m_tailSeq is the sequence number of the oldest element in the buffer, m_tailIdx is the buffer index of the oldest element in the buffer and m_size is the size of the buffer memory. 其中seqNum是要转换的序列号, m_tailSeq是在缓冲器中的最旧的元素的序列号, m_tailIdx是在缓冲器中的最旧的元素的缓冲器索引和m_size是缓冲存储器的大小。

However, if I keep adding elements to the buffer long enough, the sequence numbers will overflow. 但是,如果我将元素添加到缓冲区的时间足够长,则序列号将溢出。 So I have to check for this. 所以我必须检查一下。 And when I do that my short and sweet conversion turns into this monster: 当我这样做时,我短暂而甜蜜的转换就变成了这个怪物:

size_type getIndex(size_type seqNum) const
{
    size_type headSeq = m_tailSeq + m_numElements;

    // sequence does not wrap around
    if (m_tailSeq < headSeq)
    {
        // bounds check
        if(m_tailSeq <= seqNum && seqNum < headSeq) {
            size_type offset = seqNum - m_tailSeq;
            return (m_tailIdx + offset) % m_size;
        } else {
            throw BaseException("RingBuffer: access out of bounds", __FILE__, __LINE__);
        }
    }
    // sequence does wrap around
    else if (headSeq < m_tailSeq)
    {
        //bounds check (inverted from above)
        if(seqNum < headSeq) {
            size_type offset = (SIZE_TYPE_MAX - m_tailSeq) + seqNum;
            return (m_tailIdx + offset) % m_size;
        } else if (seqNum >= m_tailSeq) {
            size_type offset = seqNum - m_tailSeq;
            return (m_tailIdx + offset) % m_size;
        } else {
            throw BaseException("RingBuffer: access out of bounds", __FILE__, __LINE__);
        }
    }
    else if (isEmpty()) {
        throw BaseException("RingBufferIterator: accessing empty buffer", __FILE__, __LINE__);
    }
}

This amounts to two integer additions, one integer subtraction, three integer comparisons, and one modulo operation in the best case on ever single buffer access. 在有史以来最好的单缓冲区访问情况下 ,这等于两个整数加法,一个整数减法,三个整数比较和一个模运算。 Needless to say that iterating over the buffer becomes pretty expensive. 不用说,在缓冲区上进行迭代变得非常昂贵。 However, since I want to use this buffer in high performance scenarios (ie event queue in soft real time application) I would like this data structure to be as efficient as possible. 但是,由于我想在高性能场景(即软实时应用程序中的事件队列)中使用此缓冲区,所以我希望此数据结构尽可能高效。

The current use case would be as an event buffer. 当前用例将用作事件缓冲区。 One (or possibly more than one) system would write events into the buffer and other systems (more than one) would process these events at their own pace without removing them. 一个(或可能不止一个)系统会将事件写入缓冲区,而其他系统(不止一个)则以自己的速度处理这些事件而不会删除它们。 When the buffer is full old events are simply overwritten. 当缓冲区已满时,旧事件将被简单地覆盖。 This way I always have a record of the last few hundred events and different systems can go over them at their respective update rates and pick out the events that are relevant for them. 这样,我始终会记录最近的几百个事件,并且不同的系统可以按照它们各自的更新速率来查看它们,并挑选出与它们相关的事件。 The different systems will keep an iterator that points into the ring buffer so they know where they left off last time and where to resume. 不同的系统将保留一个指向环形缓冲区的迭代器,以便它们知道上次中断的位置和恢复的位置。 When a system starts processing events it needs to determine whether its iterator is still valid or whether it has been overwritten. 当系统开始处理事件时,它需要确定其迭代器是否仍然有效或是否已被覆盖。 Events are likely to be processed in big chunks at a time, so incrementing and dereferencing should be quick. 事件很可能一次被大量处理,因此增加和取消引用应该很快。 So basically we're looking at an MPMC ring buffer in a potentially multithreaded context. 因此,基本上,我们正在潜在的多线程上下文中查看MPMC环形缓冲区。

The only solution I can come up with myself is to move the burden of error checking to the user of the buffer. 我能想到的唯一解决方案是将错误检查的负担移交给缓冲区的用户。 Ie the user has to first check (by some means) if its iterator into the buffer is valid, make sure a certain stretch of the buffer stays valid and then iterate over this stretch without any further checks. 也就是说,用户必须首先(通过某种方式)检查其在缓冲区中的迭代器是否有效,请确保缓冲区的特定范围保持有效,然后在此范围内进行迭代,而无需进行进一步检查。 However, this seems error prone since I have to check the safety of the access in multiple parts of the program instead of just one place and it will get hairy if I should ever decide to make the buffer thread safe. 但是,这似乎容易出错,因为我必须检查程序的多个部分而不是一个地方的访问安全性,如果我决定使缓冲区线程安全,它将变得很麻烦。

Am I missing something? 我想念什么吗? Can this be done any better? 可以做得更好吗? Am I committing some beginners mistake? 我犯了一些初学者的错误吗?

As I mention in comment unsigned integer overflow is well defined operation . 正如我在评论中提到的, 无符号整数溢出是定义明确的操作 It's key point to implement efficient sequence numbers in C++. 这是在C ++中实现有效序列号的关键。 So we can simply subtract two unsigned integers to get the distance. 因此,我们可以简单地减去两个无符号整数来获取距离。 Then just forward distance to function that implements access by index with boundary checks. 然后,将距离转发到实现通过边界检查按索引访问的函数。 As always it will work while all possible indexes lower than half of sequence number max value. 与往常一样,它将在所有可能的索引低于序列号最大值的一半时起作用。

#include <array>
#include <climits>
#include <iostream>

unsigned int const SEQUENCE_NUMBER_FIRST = UINT_MAX-10;

class RingBuffer
{
public:
    void PushBack( char c )
    {
        GetBySeqNumber(m_tailSeq++) = c;
        if( Size() == m_buffer.size()+1 )
            PopFront();
    }
    void PopFront()
    {
        ++m_headSeq;
        if( ++m_offset % m_buffer.size() == 0 )
            m_offset = 0;
    }
    char& GetByIndex( size_t n )
    {
        if( n >= Size() )
            throw std::out_of_range("Hello, world!");
        return m_buffer[ (n+m_offset) % m_buffer.size() ];
    }
    char& GetBySeqNumber( unsigned int n )
    {
        // It is well defined operation in C++,
        // but if you try to use signed integer
        // it will become undefined behavior
        return GetByIndex( n-m_headSeq );
    }
    size_t Size() const
    {
        return m_tailSeq - m_headSeq;
    }
private:
    size_t m_offset = 0;
    unsigned int m_headSeq = SEQUENCE_NUMBER_FIRST;
    unsigned int m_tailSeq = SEQUENCE_NUMBER_FIRST;
    std::array<char,26> m_buffer;
};

int main()
{
    // initialize
    RingBuffer buf;
    for( char i=0; i<26; ++i )
        buf.PushBack( 'a'+i );

    // access trough sequence numbers
    // add or subtract one to get out of range exception
    for( unsigned int i=0; i<buf.Size(); ++i )
        std::cout << buf.GetBySeqNumber( SEQUENCE_NUMBER_FIRST+i );
    std::cout << std::endl;

    // push some more to overwrite first 10 values
    for( char i=0; i<10; ++i )
        buf.PushBack( '0'+i );

    // access trough indexes
    // add or subtract one to get out of range exception
    for( size_t i=0; i<buf.Size(); ++i )
        std::cout << buf.GetByIndex(i);
    std::cout << std::endl;

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM