Converting DOS text files to Unicode using Python

Question

I am trying to write a Python application for converting old DOS code page text files to their Unicode equivalent. Now, I have done this before using Turbo Pascal by creating a look-up table and I'm sure the same can be done using a Python dictionary. My question is: How do I index into the dictionary to find the character I want to convert and send the equivalent Unicode to a Unicode output file?

I realize that this may be a repeat of a similar question but nothing I searched for here quite matches my question.

Answer 1

Python has the codecs to do the conversions:

#!python3

# Test file with bytes 0-255.
with open('dos.txt','wb') as f:
    f.write(bytes(range(256)))

# Read the file and decode using code page 437 (DOS OEM-US).
# Write the file as UTF-8 encoding ("Unicode" is not an encoding)
# UTF-8, UTF-16, UTF-32 are encodings that support all Unicode codepoints.

with open('dos.txt',encoding='cp437') as infile:
    with open('unicode.txt','w',encoding='utf8') as outfile:
        outfile.write(infile.read())

Answer 2

You can use standard buildin decode method of bytes objects:

with open('dos.txt', 'r', encoding='cp437') as infile, \
        open('unicode.txt', 'w', encoding='utf8') as outfile:
    for line in infile:
        outfile.write(line)

Converting DOS text files to Unicode using Python

Question

2 answers

solution1
3 2016-05-29 01:12:19

solution2
0 2016-05-28 18:11:55

Converting DOS text files to Unicode using Python

Question

2 answers

solution1 3 2016-05-29 01:12:19

solution2 0 2016-05-28 18:11:55

solution1
3 2016-05-29 01:12:19

solution2
0 2016-05-28 18:11:55