简体   繁体   中英

Reading hard disk sector raw data - Why hex?

I'm trying to read hard disk sector to get the raw data. Now after searching a lot I found out that some people are storing that raw sector data in hex and some in char .

Which is better, and why ? Which will give me better performance ?

I'm trying to write it in C++ and OS is windows.

For clarification -

#include <iostream>
#include <windows.h>
#include <winioctl.h> 
#include <stdio.h>

void main() {
DWORD nRead;
char buf[512];

HANDLE hDisk = CreateFile("\\\\.\\PhysicalDrive0", 
    GENERIC_READ, FILE_SHARE_READ,        
    NULL, OPEN_EXISTING, 0, NULL);

SetFilePointer(hDisk, 0xA00, 0, FILE_BEGIN);
ReadFile(hDisk, buf, 512, &nRead, NULL);
for (int currentpos=0;currentpos < 512;currentpos++) {
    std::cout << buf[currentpos];
}
CloseHandle(hDisk);
std::cin.get();
}

Consider the above code written by someone else and not me.

Notice the datatype char buf[512]; . Storing with datatype as char and it hasn't been converted into hex.

Raw data is just "raw data"... you store it as it is, you do not convert it. So, there no performance issue here. At most the difference is in representing the raw data in human readable format. In general:

  • representing it in char format makes easier to understand if there is some text contained in it,
  • while hex is better for representing numeric data (in case it follows some kind of pattern).

In your specific case: char just means 1 byte. so you are sure you are storing your data in a 512 bytes buffer. Allocating such space in term of Integer size gets thing unnecessarily more complicated

You have got yourself confused.

The data on a disk is stored as binary, just a long ass stream of ones and zeros.

The reason it is read in hex of char format is because it is easier to do.

decimal: 36
char:    z (potentially one way of representing this value)
hex:     24
binary:  100100

The binary is the raw bit stream you would read from the disc or mememory. Hex is like a shorthand representation for it, they are completely interchangeable, one Hex 'number' simple represents four bits. Again, the decimal is just yet another way to represent that value.

The char however is a little bit tricky; for my representation, I have taken the characters 0-9 to represent the values 0-9 and then az are ** representing** the values 10-36. Equally, I could have decided to take the standard ascii value which would give me '$'.

As to why 'char' is used when dealing with bytes, it is because the C++ 'har' type is just a single byte (which is normally 8 bits).

I will also point out the problem with negative numbers. when you have a integer number, that is signed (has positive and negative) the first bit (the most significant) represents a large negative value such that if all bits are 'one' the value will represent -1. For example, with four bits so it is easy to see...

0010 = +2 1000 = -8 0110 = +6 1110 = -2

The key to this problem is that it is all just how you interpret/represent the binary values. The same sequence of bits can be represented more or less any way you want.

I'm am guessing you're talking about the final data being written to some file. The reason to use hex is because it's easier to read and harder to mess up. Generally if someone is doing some sort of human analysis on the sector they're going to use a hex editor on the raw data anyway, so if you output it as hex you skip the need for a hex viewer/editor.

For instance, on DOS/Windows you have to make sure you open a file as binary if you're going to use characters. Also you might have to make sure that the operating system doesn't mess with the character format anywhere in between.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM