[英]Python program will not read a txt file in 16bit characters
我的问题是如何让python读取文本为16位字符的文件。 文章的其余部分描述了这种情况。
我有一个文本文件,它是从iTunes导出的播放列表。 这是包含标题的一小段
Name Artist Composer Album Grouping Work Movement Number Movement Count Movement Name Genre Size Time Disc Number Disc Count Track Number Track Count Year Date Modified Date Added Bit Rate Sample Rate Volume Adjustment Kind Equalizer Comments Plays Last Played Skips Last Skipped My Rating
Keyboard Works of the Masters Randolph Hokanson Pan125b 2054816 64 03/11/2017, 18:00 03/11/2017, 17:01 256 44100 MPEG audio file 1 03/11/2017, 17:02 4 08/03/2018, 16:07
08 Traccia 08 11159905 464 03/11/2017, 17:39 03/11/2017, 16:59 192 48000 MPEG audio file 1 03/11/2017, 16:59
09 Traccia 09 17787361 741 03/11/2017, 17:39 03/11/2017, 16:58 192 48000 MPEG audio file 5 08/03/2018, 10:58
10 Traccia 10 10128290 421 03/11/2017, 17:39 03/11/2017, 16:58 192 48000 MPEG audio file 1 03/11/2017, 16:58
当我使用此代码读取它时,程序挂起。 (i保存文件中的行数)。 接下来的十六进制转储似乎表明从iTunes导出的内容为16位字符。
读取文本文件的完整代码是
file_name="full path to file goes here"
f = open(file_name, "r")
i=227
for x in range(0, i):
line = f.readline()
当我将代码读入文本编辑器时,选择了所有文本,并将其粘贴到新文档中。 该代码工作正常。
原始文件一部分的文本转储看起来像这样,从下面的新文件开始
00000000: FF FE 4E 00 61 00 6D 00 65 00 09 00 41 00 72 00 ..N.a.m.e...A.r.
00000010: 74 00 69 00 73 00 74 00 09 00 43 00 6F 00 6D 00 t.i.s.t...C.o.m.
00000020: 70 00 6F 00 73 00 65 00 72 00 09 00 41 00 6C 00 p.o.s.e.r...A.l.
00000030: 62 00 75 00 6D 00 09 00 47 00 72 00 6F 00 75 00 b.u.m...G.r.o.u.
00000040: 70 00 69 00 6E 00 67 00 09 00 57 00 6F 00 72 00 p.i.n.g...W.o.r.
00000050: 6B 00 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 k...M.o.v.e.m.e.
00000060: 6E 00 74 00 20 00 4E 00 75 00 6D 00 62 00 65 00 n.t. .N.u.m.b.e.
00000070: 72 00 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 r...M.o.v.e.m.e.
00000080: 6E 00 74 00 20 00 43 00 6F 00 75 00 6E 00 74 00 n.t. .C.o.u.n.t.
00000090: 09 00 4D 00 6F 00 76 00 65 00 6D 00 65 00 6E 00 ..M.o.v.e.m.e.n.
000000A0: 74 00 20 00 4E 00 61 00 6D 00 65 00 09 00 47 00 t. .N.a.m.e...G.
000000B0: 65 00 6E 00 72 00 65 00 09 00 53 00 69 00 7A 00 e.n.r.e...S.i.z.
000000C0: 65 00 09 00 54 00 69 00 6D 00 65 00 09 00 44 00 e...T.i.m.e...D.
000000D0: 69 00 73 00 63 00 20 00 4E 00 75 00 6D 00 62 00 i.s.c. .N.u.m.b.
000000E0: 65 00 72 00 09 00 44 00 69 00 73 00 63 00 20 00 e.r...D.i.s.c. .
000000F0: 43 00 6F 00 75 00 6E 00 74 00 09 00 54 00 72 00 C.o.u.n.t...T.r.
新文件
0000: 4E 61 6D 65 09 41 72 74 69 73 74 09 43 6F 6D 70 Name.Artist.Comp
0010: 6F 73 65 72 09 41 6C 62 75 6D 09 47 72 6F 75 70 oser.Album.Group
0020: 69 6E 67 09 57 6F 72 6B 09 4D 6F 76 65 6D 65 6E ing.Work.Movemen
0030: 74 20 4E 75 6D 62 65 72 09 4D 6F 76 65 6D 65 6E t Number.Movemen
0040: 74 20 43 6F 75 6E 74 09 4D 6F 76 65 6D 65 6E 74 t Count.Movement
0050: 20 4E 61 6D 65 09 47 65 6E 72 65 09 53 69 7A 65 Name.Genre.Size
您的文件开头看起来像UTF-16-请参阅字节顺序标记-Wikipedia
采用
file_name="full path to file goes here"
with io.open(file_name,'r', encoding='utf-16-le') as f:
for line in f:
# do something with line
打开时。
逐行读取时无需使用range()或readlines()。 如果您确实需要行号,请使用:
for lineNr,line in enumerate(f):
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.