简体   繁体   English

获取最高值的字节数组

[英]Get byte array of highest value

I've got an array containing millions of bytes. 我有一个包含数百万个字节的数组。 These bytes are int values (Int16, Int24 or Int32). 这些字节是int值(Int16,Int24或Int32)。 Now I want to get the x-bytes with the max int value out of an amount of bytes. 现在,我要从字节数中获取具有最大int值的x字节。

So to explain this better, lets imagine an array with 10 entries: 因此,为了更好地解释这一点,让我们想象一下一个包含10个条目的数组:

byte[] arr = {255, 10, 55, 60, 128, 90, 88, 66, 199, 56};

I will know if we use In16, Int24 or Int32, so for this example, lets imagine we are using Int16. 我会知道我们是否使用In16,Int24或Int32,因此对于此示例,假设我们正在使用Int16。 This means, we use 2 bytes to represent an Int16. 这意味着,我们使用2个字节来表示一个Int16。 So the Ints consist of: 因此,Ints包括:

{255, 10},
{55, 60},
{128, 90},
{88, 66},
{199, 56}

Problem1: Because this is needed for audio processing, 1046 is lower than -2096. 问题1:因为这是音频处理所需的,所以1046低于-2096。 So there is a need to compare independent of negativity 因此,需要独立于否定性进行比较

Problem2: Because this needs to be very performant, converting the bytes into Ints for comparing seems inefficient and there should be an other way. 问题2:因为这需要非常有效,所以将字节转换为Ints进行比较似乎效率低下,应该采用其他方法。

This is the starting point: 这是起点:

    /// <summary>
    /// Gets the maximum value of a number of bytes representing Int-Values
/// </summary>
/// <returns>The channels.</returns>
/// <param name="leftChannel">Left channel.</param>
/// <param name="rightChannel">Right channel.</param>
/// <param name="bytesPerInt">Bytes per int. 2 bytes = Int16, 3 bytes = Int24, 4 bytes = Int32</param>
/// <param name="countBytesToCombine">The number of bytes to look for the highest value</param>
private (byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countBytesToCombine)
{

}

/// <summary>
/// Gets the highest byte[] value 
/// </summary>
/// <returns>The highest value. The size of the byte array is equal the bytesPerInt</returns>
/// <param name="bytes">A subarray of the given byte array of the upper method. The size of this array is equals countBytesToCombine</param>
/// <param name="bytesPerInt">The count of bytes representing an Int</param>
private byte[] GetHighestValue(byte[] bytes, int bytesPerInt)
{

}

Edit2 EDIT2

This is a working solution but it takes about 2 seconds to execute with 14 million bytes for each channel which is way too far. 这是一个可行的解决方案,但是每个通道要用1400万字节来执行大约需要2秒钟的时间,这太过分了。

    /// <summary>
    /// Gets the maximum value of a number of bytes representing Int-Values
    /// </summary>
    /// <returns>The channels.</returns>
    /// <param name="leftChannel">Left channel.</param>
    /// <param name="rightChannel">Right channel.</param>
    /// <param name="bytesPerInt">Bytes per int. 2 bytes = Int16, 3 bytes = Int24, 4 bytes = Int32</param>
    /// <param name="countValuesToCombine">The number of bytes to look for the highest value</param>
    private (byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countValuesToCombine)
    {
        var cLeft = new List<byte>();
        var cRight = new List<byte>();

        for (int i = 0; i < leftChannel.Length; i += countValuesToCombine * bytesPerInt)
        {
            var arrLeft = SubArray(leftChannel, i, countValuesToCombine * bytesPerInt);
            var arrRight = SubArray(rightChannel, i, countValuesToCombine * bytesPerInt);

            cLeft.AddRange(GetHighestValue(arrLeft, bytesPerInt));
            cRight.AddRange(GetHighestValue(arrRight, bytesPerInt));
        }

        return (cLeft.ToArray(), cRight.ToArray());
    }

    /// <summary>
    /// Gets the highest byte[] value 
    /// </summary>
    /// <returns>The highest value.</returns>
    /// <param name="bytes">Bytes.</param>
    /// <param name="bytesPerInt">The count of bytes representing an Int</param>
    private byte[] GetHighestValue(byte[] bytes, int bytesPerInt)
    {
        byte[] bytesOfHighestValue = new byte[bytesPerInt];

        for (int i = 0; i < bytes.Length; i += bytesPerInt)
        {
            var arr = SubArray(bytes, i, bytesPerInt);

            if (IsValueHigher(arr, bytesOfHighestValue, bytesPerInt))
            {
                bytesOfHighestValue = arr;
            }
        }

        return bytesOfHighestValue;
    }

    private bool IsValueHigher(byte[] one, byte[] two, int bytesPerInt)
    {
        var o = ConvertToInt(one, bytesPerInt);
        var t = ConvertToInt(two, bytesPerInt);

        return Math.Abs(o) > Math.Abs(t);
    }

    private int ConvertToInt(byte[] bytes, int bytesPerInt)
    {
        switch (bytesPerInt)
        {
            case 2:
                return BitConverter.ToInt16(bytes, 0);
            case 3:
                return Int24.ToInt32(bytes, 0);
            case 4:
                return BitConverter.ToInt32(bytes, 0);
        }

        return 0;
    }

This is extremely difficult to explain so please ask if there are questions before downvoting. 这很难解释,因此请在投票之前询问是否有问题。

Ok so here is a straightforward implementation for 4 byte integers: 好的,这是4个字节整数的简单实现:

private static int GetHighestValue(byte[] data)
{
  if (data.Length % 4 != 0)
     throw new ArgumentException();

  var maximum = 0, maximumAbs = 0;
  for (var i = 0; i < data.Length; i+=4)
  {
    var current = BitConverter.ToInt32 (data, i);
    var currentAbs = Math.Abs(current);

    if (currentAbs > maximumAbs)
    {
      maximum = current;
      maximumAbs = currentAbs;
    }
  }

  return maximum;
}

Running this on a byte[] with 1 million bytes it takes about 3ms while compiling with Debug. 在使用Debug进行编译时,在具有1百万个字节的byte[]上运行此命令大约需要3毫秒。

I do not know what kind of speeds you are aiming at but for 99% of cases this should be fine. 我不知道您要针对哪种速度,但是对于99%的情况,这应该没问题。


Edit: Since you updated your question and included sample code here is an update: 编辑:由于您更新了您的问题并包括示例代码在这里是一个更新:

These are some areas I that make your code slower than it needs to be: 这些是我使您的代码慢于需要的地方:

  • We do not need to create sub arrays in every iteration of CombineChannels . 我们不需要在CombineChannels每次迭代中创建子数组。 We can rewrite GetHighestValue so that it takes the array , offset and amount as parameter. 我们可以重写GetHighestValue ,使其将arrayoffsetamount作为参数。

  • Instead of having one CombineChannels method we should split it up into the different byte sizes. 与其使用一个CombineChannels方法, CombineChannels将其拆分为不同的字节大小。 For example CombineChannelsInt32 , CombineChannelsInt16 ... This way the methods itself can store the maximum as int32 / int16 /... without having to convert them at every iteration. 例如CombineChannelsInt32CombineChannelsInt16 ...这样,方法本身可以将最大值存储为int32 / int16 / ...,而无需在每次迭代时都将其转换。

So here are the methods we would end up with something like this: 因此,这是我们最终会得到如下结果的方法:

(byte[] combinedLeft, byte[] combinedRight) CombineChannels(byte[] leftChannel, byte[] rightChannel, int bytesPerInt, int countValuesToCombine)
{
  switch(bytesPerInt)
  {
    case 2:
      return CombineChannelsInt16(leftChannel, rightChannel, countValuesToCombine);
    case 3:
      return CombineChannelsInt24(leftChannel, rightChannel, countValuesToCombine);
    case 4:
      return CombineChannelsInt32(leftChannel, rightChannel, countValuesToCombine);
  }
}

(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt16(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);
(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt24(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);
(byte[] combinedLeft, byte[] combinedRight) CombineChannelsInt32(byte[] leftChannel, byte[] rightChannel, int countValuesToCombine);

short GetHighestValueInt16(byte[] bytes, int offset, int amount);
Int24 GetHighestValueInt24(byte[] bytes, int offset, int amount);
int GetHighestValueInt32(byte[] bytes, int offset, int amount);

I made a method what returns the max index. 我做了一个返回最大索引的方法。 It compares the highest bytes first and when equal goes to the lower. 它首先比较高字节,当相等时则比较低字节。 With bigger ints it works even faster. 使用更大的整数,它的运行速度甚至更快。

static int getMaxIndex(byte[] data, int byteLenght)
        {
            int MaxIndex = 0;
            int signMax = data[byteLenght - 1] >> 7;// get sign
            for (int i = byteLenght; i < data.Length; i += byteLenght)
            {
                int step = byteLenght - 1;
                int compResult = 0;

                while (compResult == 0 && step > -1)
                {
                    if (step == byteLenght -1)
                    {
                        int signData = data[i + step] >> 7;

                        compResult = signData - signMax;
                        if (compResult == 0) compResult = data[MaxIndex + step] & 127 - data[i + step] & 127;
                    }
                    else compResult = data[MaxIndex + step] - data[i + step];
                    if (compResult < 0)
                    {
                        MaxIndex = i;
                        signMax = data[MaxIndex + step] >> 7;
                    }
                    step--;
                }
            }
            return MaxIndex;
        }

As was mentioned a few times already, avoid "if" statements inside your read; 如前所述,请避免在您的阅读内容中出现“ if”语句; just make a separate function for reading Int16 , Int24 and Int32 , and select which one to use in advance. 只需为读取Int16Int24Int32做一个单独的函数,然后预先选择要使用的函数即可。

Personally I'd use the System.IO.BinaryReader for this; 我个人将使用System.IO.BinaryReader进行此操作; it already contains functions for reading integers off streams, and unlike BitConverter , which technically depends on system endianness, BinaryReader is actually guaranteed to read values as little-endian; 它已经包含了从流中读取整数的函数,并且与BitConverter不同,从技术上来说, BitConverter依赖于系统的字节顺序, BinaryReader实际上可以保证读取的值是低字节顺序的。 it's in the MSDN specs. 它在MSDN规范中。

Here is the basic function to use the BinaryReader , using Int32 as example. 这是使用BinaryReader的基本功能,以Int32为例。 In this version I let the EndOfStreamException take care of the end. 在此版本中,我让EndOfStreamException负责结束。 They say exception throwing / handling is quite a heavy operation, but in this case it replaces a lot of checks between reads, so it might be justified. 他们说异常抛出/处理是一项繁重的操作,但是在这种情况下,它替换了两次读取之间的大量检查,因此可能是合理的。

You could adapt that by replacing the while (true) with an actual check on the stream pointer. 您可以通过对流指针进行实​​际检查来替换while (true)来适应这种情况。 It's either just checking ms.Position against the input byte array's length, or keeping track of the location in your own variable you increment by the read amount of bytes in each step. 它只是根据输入字节数组的长度检查ms.Position,或者跟踪您自己变量中的位置,从而在每一步中增加读取的字节数。

public static Int32 GetHighestValueInt32(Byte[] bytes)
{
    Int32 maxval = 0;
    try
    {
        using (MemoryStream ms = new MemoryStream(bytes))
        using (BinaryReader reader = new BinaryReader(ms))
        {
            while (true)
            {
                // Clear highest bit so the value's always a positive Int32.
                Int32 val = (Int32)(reader.ReadUInt32() & 0x7FFFFFFF);
                if (val > maxval)
                    maxval = val;
            }
        }
    }
    catch (EndOfStreamException ex)
    {
        // Finished reading!
    }
    return maxval;
}

For Int16 , the actual line reading val should simply be replaced by 对于Int16 ,实际的行读取值val应该简单地替换为

Int16 val = (Int16)(reader.ReadUInt16() & 0x7FFF);

And maxval and the return type should likewise be changed to Int16 . 并且maxval和return类型应该同样更改为Int16

BinaryReader can't natively read an Int24 off the stream, though. BinaryReader不能从流中本地读取Int24 But the workaround for that isn't too hard. 但是,解决方法并不难。 You can simply use Int32 and shift it down by 8 bits, and then adapt the stream pointer manually to compensate for the two extra read bytes: 您可以简单地使用Int32并将其下移8位,然后手动调整流指针以补偿两个额外的读取字节:

while (true)
{
    Int32 val = (Int32)((reader.ReadUInt32() >> 8) & 0x7FFFFF);
    ms.Position -= 2;
    if (val > maxval)
        maxval = val;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM