简体   繁体   English

代表二进制数据的集合

[英]Represent of a collection of binary data

I'm working with signals in which the samples consist of Float s. 我正在处理其中样本包含Float的信号。 Some of the algorithms I've written only require to know when the signal crosses the x-axis (ie positive value to a negative value and vice versa). 我编写的某些算法仅需要知道信号何时穿过x轴(即,正值变为负值,反之亦然)。 When I'm doing these kinds of operations, I realized that I don't need to know the actual Float value of each sample. 当我执行这些操作时,我意识到我不需要知道每个样本的实际Float值。 I just need to know whether the sample's value is positive or not. 我只需要知道样本的值是否为正即可。

I originally represented the signal as a Vector of Float s. 我最初将信号表示为FloatVector After my discovery, I started representing it as a Vector of Boolean values (ie False for a negative value and True for a positive value). 发现之后,我开始将其表示为Boolean值的Vector (即, False (负)值, True (正)值)。 This turned out to be a lot more efficient and I improved the program's performance both in terms of run-time and memory consumption. 事实证明,这要高效得多,而且我在运行时和内存消耗方面都提高了程序的性能。

I'm still wondering if there isn't a more efficient way of representing this "collection of binary data". 我仍然想知道是否没有一种更有效的方式来表示这种“二进制数据集合”。 Something like a Bit Vector or Bit Array . 就像一个Bit VectorBit Array I've found a BitArray on Hackage but it doesn't seem to support the same functionality that a Vector does. 我在Hackage上找到了BitArray ,但它似乎不支持Vector所具有的相同功能。

Is there a more efficient way of representing the data of my use case or should I stick to a Vector of Boolean values? 是否有一种更有效的方式来表示用例的数据,还是应该坚持使用BooleanVector

Both one-bool-per-byte and one-bool-per-bit options are available from the vector and array packages respectively. vectorarray程序包中分别提供了每字节1个布尔值和每比特1个布尔值选项。

First, a Vector Bool from Data.Vector.Unboxed uses a byte array with one byte per Bool . 首先,来自Data.Vector.UnboxedVector Bool使用一个字节数组,每个Bool一个字节。 This can be verified from the source in module Data.Vector.Unboxed.Base where Vector Bool is defined as: 可以从Data.Vector.Unboxed.Base模块中的源进行验证,其中Vector Bool定义为:

newtype instance Vector    Bool = V_Bool  (P.Vector    Word8)

and getting and setting are mediated through the functions: 和获取和设置是通过以下功能来实现的:

fromBool :: Bool -> Word8
toBool :: Word8 -> Bool

Alternatively, it can be verified directly by profiling the program: 另外,也可以通过对程序进行概要分析来直接对其进行验证:

import Data.Vector.Unboxed as V
main = let v = V.replicate 1000000000 True
  in print (v ! 5)

and observing that it allocates just over 1,000,000,000 bytes. 并观察到它分配的刚好超过1,000,000,000个字节。

Second, a UArray Int Bool from Data.Array.Unboxed is implemented as a bit vector, with one Bool per bit. 第二,一个UArray Int BoolData.Array.Unboxed被实现为位向量,其中一个Bool每比特。 The relevant source is in Data.Array.Base , where you can see the bit manipulation used in the instance: 相关源位于Data.Array.Base ,您可以在其中查看实例中使用的位操作:

instance IArray UArray Bool where
    ...
    unsafeAt (UArray _ _ _ arr#) (I# i#) = isTrue#
        ((indexWordArray# arr# (bOOL_INDEX i#) `and#` bOOL_BIT i#)
        `neWord#` int2Word# 0#)

Again, this can be verified directly by profiling: 同样,这可以通过分析直接验证:

import Data.Array.Unboxed as A
main = let v = A.listArray (1,1000000000) (repeat True) :: UArray Int Bool
  in print (v ! 5)

and verifying that it allocates approximately 125,000,000 bytes. 并确认它分配了大约125,000,000字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM