[英]Writing bits to file?
I'm trying to implement a Huffman tree.我正在尝试实现霍夫曼树。
Content of my simple .txt file that I want to do a simple test:我想要做一个简单测试的简单 .txt 文件的内容:
aaaaabbbbccd
Frequencies of characters: a:5, b:4, c:2, d:1字符频率:a:5, b:4, c:2, d:1
Code Table: (Data type of 1 s and 0 s: string )代码表:(数据类型的1和0 S:串)
a:0
d:100
c:101
b:11
Result that I want to write as binary: (22 bits)我想写为二进制的结果:(22 位)
0000011111111101101100
How can I write bit-by-bit each character of this result as a binary to ".dat" file?如何将此结果的每个字符作为二进制文件逐位写入“.dat”文件? (not as string)
(不是字符串)
Answer: You can't.回答:你不能。
The minimum amount you can write to a file (or read from it), is a char
or unsigned char
.您可以写入文件(或从中读取)的最小数量是
char
或unsigned char
。 For all practical purposes, a char has exactly eight bits.出于所有实际目的,一个字符正好有八位。
You are going to need to have a one char buffer, and a count of the number of bits it holds.您将需要一个单字符缓冲区,以及它保存的位数。 When that number reaches 8, you need to write it out, and reset the count to 0. You will also need a way to flush the buffer at the end.
当该数字达到 8 时,您需要将其写出,并将计数重置为 0。您还需要一种在最后刷新缓冲区的方法。 (Not that you cannot write 22 bits to a file - you can only write 16 or 24. You will need some way to mark which bits at the end are unused.)
(并不是说您不能将 22 位写入文件 - 您只能写入 16 位或 24 位。您需要某种方式来标记末尾的哪些位未使用。)
Something like:就像是:
struct BitBuffer {
FILE* file; // Initialization skipped.
unsigned char buffer = 0;
unsigned count = 0;
void outputBit(unsigned char bit) {
buffer <<= 1; // Make room for next bit.
if (bit) buffer |= 1; // Set if necessary.
count++; // Remember we have added a bit.
if (count == 8) {
fwrite(&buffer, sizeof(buffer), 1, file); // Error handling elided.
buffer = 0;
count = 0;
}
}
};
The OP asked: OP问:
How can I write bit-by-bit each character of this result as a binary to ".dat" file?
如何将此结果的每个字符作为二进制文件逐位写入“.dat”文件? (not as string)
(不是字符串)
You can not and here is why...你不能,这就是为什么......
Memory model
内存模型
Defines the semantics of a computer memory storage for the purpose of C++ abstract machine.
为 C++ 抽象机定义计算机内存存储的语义。
The memory available to a C++ program is one or more contiguous sequences of
bytes
.C++ 程序可用的内存是一个或多个连续的
bytes
序列。 Each byte in memory has a unique address .内存中的每个字节都有一个唯一的地址。
Byte
字节
A
byte
is the smallest addressable unit of memory.byte
是最小的可寻址内存单元。 It is defined as a contiguous sequence of bits, large enough to hold the value of any UTF-8 code unit (256 distinct values) and of(since C++14)
any member of the basic execution character set (the 96 characters that are required to besingle-byte
).它被定义为一个连续的位序列,大到足以容纳任何 UTF-8 代码单元的值(256 个不同的值)和
(since C++14)
基本执行字符集的任何成员的值(96 个字符)必须是single-byte
)。 Similar to C, C++ supportsbytes
of sizes 8 bits and greater.与 C 类似,C++ 支持大小为 8 位及更大的
bytes
。The types
char
,unsigned char
, andsigned char
use one byte for both storage and value representation.类型
char
、unsigned char
和signed char
使用一个字节用于存储和值表示。 The number of bits in a byte is accessible asCHAR_BIT
orstd::numeric_limits<unsigned char>::digits
.字节中的位数可作为
CHAR_BIT
或std::numeric_limits<unsigned char>::digits
。
Compliments of cppreference.com
cppreference.com
You can find this page here: cppreference:memory model您可以在此处找到此页面: cppreference:内存模型
This comes from the 2017-03-21: standard这来自 2017-03-21: 标准
©ISO/IEC N4659
©ISO/IEC N4659
4.4 The C++ memory model [intro.memory]
4.4 C++内存模型[intro.memory]
- The fundamental storage unit in the C++ memory model is the byte .
C++ 内存模型中的基本存储单元是字节。 A byte is at least large enough to contain any member of the basic execution character set (5.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, 4 the number of which is implementation-defined.
一个字节至少大到足以包含基本执行字符集 (5.3) 的任何成员和 Unicode UTF-8 编码形式的八位代码单元,并且由连续的位序列组成,其中的位数为4是实现定义的。 The least significant bit is called the low-order bit ;
最低有效位称为低位; the most significant bit is called the high-order bit .
最高位称为高位。 The memory available to a C++ program consists of one or more sequences of contiguous bytes.
C++ 程序可用的内存由一个或多个连续字节序列组成。 Every byte has a unique address.
每个字节都有一个唯一的地址。
- [ Note : The representation of types is described in 6.9.
[注:类型的表示在 6.9 中描述。 — end note ]
—尾注]
- A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having nonzero width.
内存位置要么是标量类型的对象,要么是所有具有非零宽度的相邻位域的最大序列。 [ Note : Various features of the language, such as references and virtual functions, might involve additional memory locations that are not accessible to programs but are managed by the implementation.
[注意:该语言的各种特性,例如引用和虚拟函数,可能涉及程序无法访问但由实现管理的额外内存位置。 — end note ] Two or more threads of execution (4.7) can access separate memory locations without interfering with each other.
—尾注] 两个或多个执行线程 (4.7) 可以访问单独的内存位置而不会相互干扰。
- [ Note : Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be concurrently updated by two threads of execution without interference.
[注意:因此位域和相邻的非位域位于不同的内存位置,因此可以由两个执行线程同时更新而不受干扰。 The same applies to two bit-fields, if one is declared inside a nested struct declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration.
这同样适用于两个位域,如果一个在嵌套结构声明中声明而另一个不在,或者如果两者由零长度位域声明分隔,或者如果它们由非位分隔- 字段声明。 It is not safe to concurrently update two bit-fields in the same struct if all fields between them are also bit-fields of nonzero width.
如果它们之间的所有字段也是非零宽度的位字段,则在同一结构中同时更新两个位字段是不安全的。 — end note ]
—尾注]
[ Example : A structure declared as
[示例:声明为的结构
struct { char a; int b:5, c:11, :0, d:8; struct {int ee:8;} e; }
contains four separate memory locations: The field a and bit-fields d and e.ee are each separate memory locations, and can be modified concurrently without interfering with each other.
包含四个独立的内存位置:字段 a 以及位字段 d 和 e.ee 都是独立的内存位置,可以同时修改而不会相互干扰。 The bit-fields b and c together constitute the fourth memory location.
位域 b 和 c 共同构成第四个存储位置。 The bit-fields b and c cannot be concurrently modified, but b and a, for example, can be.
位域 b 和 c 不能同时修改,但例如 b 和 a 可以。 — end example ]
—结束示例]
4) The number of bits in a byte is reported by the macro CHAR_BIT in the header<climits>
.4) 字节中的位数
<climits>
的宏CHAR_BIT报告。
This version of the standard can be found here: www.open-std.org section § 4.4
on pages 8 & 9.此版本的标准可以在这里找到: www.open-std.org section
§ 4.4
on pages 8 & 9。
The smallest possible memory module that can be written to in a program is 8 contiguous bits or more for a standard byte.对于标准字节,可以写入程序的最小内存模块是 8 个连续位或更多位。 Even with bit fields, the
1 byte
requirement still holds.即使有位字段,
1 byte
要求仍然成立。 You can manipulate, toggle, set, individual bits within a byte
but you can not write individual bits
.您可以操作、切换、设置一个
byte
单个位,但不能写入单个bits
。
What can be done is to have a byte
buffer with a count of bits written.可以做的是拥有一个写入比特数的
byte
缓冲区。 When your required bits are written you will need to have the rest of the unused bits marked as padding
or un-used buffer bits
.写入所需的位后,您需要将其余未使用的位标记为
padding
或未un-used buffer bits
。
Edit编辑
[ Note: ] -- When using bit fields
or unions
one thing that you must take into consideration is the endian
of the specific architecture. [注意: ] -- 使用
bit fields
或unions
,您必须考虑的一件事是特定架构的endian
。
Hello, from my experience I have found a way to do that simple.您好,根据我的经验,我找到了一种简单的方法。 For the task you need to define yourself and array of characters (it just needs to be for instance 1 byte, it can be bigger).
对于您需要定义自己和字符数组的任务(它只需要例如 1 个字节,它可以更大)。 After that you must define functions to access a specific bit from any element.
之后,您必须定义函数以从任何元素访问特定位。 For example, how to write an expression to get the value of the 3th bit from a char in C++.
例如,如何编写表达式以从 C++ 中的 char 中获取第 3 位的值。
*/*position is [1,..,n], and bytes
are in little endian and index from 0`enter code here`*/
int bit_at(int position, unsigned char byte)
{
return (byte & (1 << (position - 1)));
}*
Now you can vision the array of bytes as this [b1,...,bn]现在您可以将字节数组视为 [b1,...,bn]
Now what we actually have in memory is 8 * n bits of memory We can try to visualize it like so.现在我们在内存中实际拥有的是 8 * n 位内存我们可以尝试像这样可视化它。 NOTE: the arrays is zeroed!
注意:数组已归零! |0000 0000|0000 0000|...|0000 0000|
|0000 0000|0000 0000|...|0000 0000|
Now from this you or whoever wants can figure how to manipulate it to get a specific bit from this array.现在,您或任何想要的人都可以弄清楚如何操作它以从该数组中获取特定位。 Of course there will be some sort of converted but that is not such a problem.
当然会有某种转换,但这不是这样的问题。 In the end, for the encoding you provide, that is: a:0 d:100 c:101 b:11
最后,对于您提供的编码,即:a:0 d:100 c:101 b:11
We can encode the message "abcd", and make an array that holds the bits of the message, using the elements of the array as arrays for bits, like so:我们可以对消息“abcd”进行编码,并创建一个包含消息位的数组,使用数组元素作为位数组,如下所示:
|0111 0110|0000 0000| |0111 0110|0000 0000|
You can write this to memory and you will have an excess of at most 7 bits.您可以将其写入内存,最多会多出 7 位。 This is a simple example, but it can be extended into much more.
这是一个简单的例子,但它可以扩展到更多。 I hope this gave some answers to your question.
我希望这能给你的问题一些答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.