I am trying to write a Python application for converting old DOS code page text files to their Unicode equivalent. Now, I have done this before using Turbo Pascal by creating a look-up table and I'm sure the same can be done using a Python dictionary. My question is: How do I index into the dictionary to find the character I want to convert and send the equivalent Unicode to a Unicode output file?
I realize that this may be a repeat of a similar question but nothing I searched for here quite matches my question.
Python has the codecs to do the conversions:
#!python3
# Test file with bytes 0-255.
with open('dos.txt','wb') as f:
f.write(bytes(range(256)))
# Read the file and decode using code page 437 (DOS OEM-US).
# Write the file as UTF-8 encoding ("Unicode" is not an encoding)
# UTF-8, UTF-16, UTF-32 are encodings that support all Unicode codepoints.
with open('dos.txt',encoding='cp437') as infile:
with open('unicode.txt','w',encoding='utf8') as outfile:
outfile.write(infile.read())
You can use standard buildin decode
method of bytes
objects:
with open('dos.txt', 'r', encoding='cp437') as infile, \
open('unicode.txt', 'w', encoding='utf8') as outfile:
for line in infile:
outfile.write(line)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.