[英]Represent of a collection of binary data
I'm working with signals in which the samples consist of Float
s. 我正在处理其中样本包含Float
的信号。 Some of the algorithms I've written only require to know when the signal crosses the x-axis (ie positive value to a negative value and vice versa). 我编写的某些算法仅需要知道信号何时穿过x轴(即,正值变为负值,反之亦然)。 When I'm doing these kinds of operations, I realized that I don't need to know the actual Float
value of each sample. 当我执行这些操作时,我意识到我不需要知道每个样本的实际Float
值。 I just need to know whether the sample's value is positive or not. 我只需要知道样本的值是否为正即可。
I originally represented the signal as a Vector
of Float
s. 我最初将信号表示为Float
的Vector
。 After my discovery, I started representing it as a Vector
of Boolean
values (ie False
for a negative value and True
for a positive value). 发现之后,我开始将其表示为Boolean
值的Vector
(即, False
(负)值, True
(正)值)。 This turned out to be a lot more efficient and I improved the program's performance both in terms of run-time and memory consumption. 事实证明,这要高效得多,而且我在运行时和内存消耗方面都提高了程序的性能。
I'm still wondering if there isn't a more efficient way of representing this "collection of binary data". 我仍然想知道是否没有一种更有效的方式来表示这种“二进制数据集合”。 Something like a Bit Vector
or Bit Array
. 就像一个Bit Vector
或Bit Array
。 I've found a BitArray on Hackage but it doesn't seem to support the same functionality that a Vector
does. 我在Hackage上找到了BitArray ,但它似乎不支持Vector
所具有的相同功能。
Is there a more efficient way of representing the data of my use case or should I stick to a Vector
of Boolean
values? 是否有一种更有效的方式来表示用例的数据,还是应该坚持使用Boolean
值Vector
?
Both one-bool-per-byte and one-bool-per-bit options are available from the vector
and array
packages respectively. vector
和array
程序包中分别提供了每字节1个布尔值和每比特1个布尔值选项。
First, a Vector Bool
from Data.Vector.Unboxed
uses a byte array with one byte per Bool
. 首先,来自Data.Vector.Unboxed
的Vector Bool
使用一个字节数组,每个Bool
一个字节。 This can be verified from the source in module Data.Vector.Unboxed.Base
where Vector Bool
is defined as: 可以从Data.Vector.Unboxed.Base
模块中的源进行验证,其中Vector Bool
定义为:
newtype instance Vector Bool = V_Bool (P.Vector Word8)
and getting and setting are mediated through the functions: 和获取和设置是通过以下功能来实现的:
fromBool :: Bool -> Word8
toBool :: Word8 -> Bool
Alternatively, it can be verified directly by profiling the program: 另外,也可以通过对程序进行概要分析来直接对其进行验证:
import Data.Vector.Unboxed as V
main = let v = V.replicate 1000000000 True
in print (v ! 5)
and observing that it allocates just over 1,000,000,000 bytes. 并观察到它分配的刚好超过1,000,000,000个字节。
Second, a UArray Int Bool
from Data.Array.Unboxed
is implemented as a bit vector, with one Bool
per bit. 第二,一个UArray Int Bool
从Data.Array.Unboxed
被实现为位向量,其中一个Bool
每比特。 The relevant source is in Data.Array.Base
, where you can see the bit manipulation used in the instance: 相关源位于Data.Array.Base
,您可以在其中查看实例中使用的位操作:
instance IArray UArray Bool where
...
unsafeAt (UArray _ _ _ arr#) (I# i#) = isTrue#
((indexWordArray# arr# (bOOL_INDEX i#) `and#` bOOL_BIT i#)
`neWord#` int2Word# 0#)
Again, this can be verified directly by profiling: 同样,这可以通过分析直接验证:
import Data.Array.Unboxed as A
main = let v = A.listArray (1,1000000000) (repeat True) :: UArray Int Bool
in print (v ! 5)
and verifying that it allocates approximately 125,000,000 bytes. 并确认它分配了大约125,000,000字节。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.