简体   繁体   中英

Reading txt files in python

I'm currently following "Learn Python the Hard Way". However, when I use the .read() command on my .txt files it outputs the text in a very weird way, with extra spaces, and a square at the start:

多余的空间和正方形。

The console is Windows Powershell.

My code looks like this:

from sys import argv #imports argv from sys

script, filename = argv #unpacks script and filename from argv

txt = open(filename) #declares the variable txt as the text in filename

print "Here's your file %r" % filename #prints the string and the filename
print txt.read() #prints a reading of txt
txt.close()

print "Type the filename again:" #prints the string
file_again = raw_input("> ") #declares the variable file_again as the raw input

txt_again = open(file_again) #declares the variable txt_again as the text in file_again

print txt_again.read() #prints a reading of txt_again
txt.close()

And the files looks like this:

This is stuff I typed into a file.
It is really cool stuff.
Lots and lots of fun to have in here.

Please help!

Your file seems to be encoded with a 2-Byte encoding; presumably UTF-16. Since python can't guess that, it just outputs the bytes as it gets them; for ASCII-only text, this means that every other character is plain-text readable.

If you're using Python 2.7.x, you should take that ASCII string and do:

text = txt.read().decode("utf-16")
print text

That should output the file a readable way. As it has been pointed before, the file seems to be encoded in UTF-16, so this shouldn't be taken as "the way to read text files". If you use Notepad++ you can select the file encoding from the "Encoding" menu. Microsoft Notepad lets you select the encoding in the "Save as..." dialog.

Take a look at https://docs.python.org/2/howto/unicode.html

Either your file is Unicode, or PowerShell is doing something funny with the encoding. The link above explains how to open Unicode files in Python 2.x - the relevant portion being here:

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
  print repr(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM