简体   繁体   中英

How to convert the content of a .dat file to a human readable form using Python?

There is a file called "settings.dat" which I want to read and edit. On opening this file through Notepad, I get an unreadable encoding.

I'm thinking this is probably a binary file. And the encoding is probably UTF-16, as far as I can tell. This is how I tried to convert it:

with open('settings.dat', 'rb') as binary_file:
    raw_data = binary_file.read()
    str_data = raw_data.decode('utf-16', 'ignore')
    print(str_data)

The Output is again an unreadable form, with characters that look Chinese. Isn't this supposed to be a simple bytes-to-string conversion problem? Here is the output:

䕗䙃h 3 Ԁ ː ᙫ ␐☐ᜐ┐Ⱀ⨐ᴐሐ⼐【ㄐ㈐䠐倐䬐䴐ᄐἐḐ‐점퀐쬐촐

.dat files are generic files, and can either be binary or text. These files are usually accessed and used only for application support, and each application treats .dat files differently. Hence, .dat files follow no specific protocols which affect all .dat files, unlike .gif or .docx files.

If you want to understand how .dat files work and convert to human-readable form, you need to know how the application handles these files beforehand.

For the Chinese characters, you tried to decode the binary .dat file by the UTF-16 format. That does not change the file content; you are just grouping sequences of bytes of repeating sequences of bbbb bbbb bbbb bbbb = xxxx where the b are the bytes and the x are the hexadecimal digits.

Many Unicode characters are Chinese [technically they are called ideographs or ideographic] whereas others are unused, aka reserved.

Not a python answer, but the strings command line tool is often invaluable in reverse engineering data formats, letting you easily skim through a binary in search for finding familiar plaintext patterns. Obviously if some kind of encryption/compression is used (such as commonly used gzip) it won't help and needs some preprocessing first.

Calling it is as simple as that:

user@host:~/ $ strings mydir/settings.dat

If it's a binary file, then why do you want to view it? Unless you're aware beforehand that settings.dat contains human-readable characters, it does not make sense to attempt to "find" an encoding so that the output is human-readable characters, because you won't be successful.

On the other hand, if you do know that settings.dat contains human-readable characters, then maybe utf-16 is the wrong encoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM