简体   繁体   English

读取二进制文件(python)

[英]read a binary file (python)

I cant read a file, and I dont understand why: 我无法读取文件,我不明白为什么:

f = open("test/test.pdf", "r")
data = list(f.read())
print data

Returns : [] 返回: []

I would like to open a PDF, and extract every bytes, and put it in a List. 我想打开一个PDF,并提取每个字节,并将其放入List中。

What's wrong with my code ? 我的代码出了什么问题? :( :(

Thanks, 谢谢,

f = open("test/test.pdf", "rb")

You must include the pseudo-mode "b" for binary when reading and writing on Windows. 在Windows上读写时,必须包含二进制伪模式“b”。 Otherwise the OS silently translates what it considers to be "line endings", causing i/o corruption. 否则,操作系统会默默地翻译它认为是“行结尾”的内容,从而导致i / o损坏。

Jonathan is correct that you should be opening the file in binary mode if you are on windows. 如果你在Windows上,Jonathan是正确的,你应该以二进制模式打开文件。

However, a PDF file will start with "%PDF-", which would at least be read in regardless of whether you are using binary mode or not. 但是,PDF文件将以“%PDF-”开头,无论您是否使用二进制模式,都至少会读取该文件。

So it appears to me that your "test/test.pdf" is an empty file 所以在我看来你的“test / test.pdf”是一个空文件

  • As best as I understand the pdf format, a pdf file shouldn't be a binary file. 据我所知,pdf格式,pdf文件不应该是二进制文件。 It should be a text file that may contain lots of binary blobs. 它应该是一个可能包含大量二进制blob的文本文件。 I could be wrong. 我错了。
  • On Windows, if you are opening a binary file, you need to include b in the mode of your file, ie open(filename, "rb") . 在Windows上,如果要打开二进制文件,则需要在文件模式中包含b ,即open(filename, "rb")
    • On Unix-like systems, the b doesn't hurt anything, though it does not mean anything. 在类Unix系统上, b不会伤害任何东西,尽管它没有任何意义。
  • Always use a context manager with your files. 始终对文件使用上下文管理器。 That is to say, instead of writing f = open("test/test.pdf", "rb") , say with open("test/test.pdf", "r") as f: . 也就是说,不是写f = open("test/test.pdf", "rb")with open("test/test.pdf", "r") as f: . This will assure your file always gets closed. 这将确保您的文件始终关闭。
  • list(f.read()) is not likely to be useful code very often. list(f.read())不太可能经常是有用的代码。 f.read() reaurns a str and calling list on it makes a list of the characters (one-byte strings). f.read()在其上重新生成一个str和调用list ,生成一个字符列表(一个字节的字符串)。 This is very seldom needed. 这很少需要。
  • Binary or text or whatever, read should work. 二进制或文本或其他什么, read应该工作。 Are you positive that there is anything in test/test.pdf ? 你是否肯定test/test.pdf中有什么内容? Python does not seem to think there is. Python似乎并不认为有。

What platform are you running on? 你在运行什么平台?

Using python 2.6 on Windows XP, I get: 在Windows XP上使用python 2.6,我得到:

f = open("14500lf.pdf", "r") f = open(“14500lf.pdf”,“r”)
data = list(f.read()) data = list(f.read())
print data 打印数据
['%', 'P', 'D', 'F', '-', '1', '.', '5', '\\r', '%', '\\xe2', '\\xe3', '\\xcf', '\\xd3', '\\n', '1', ' ', '0', ' ', 'o', 'b', 'j', '<', '<', '/', 'C', 'o', 'n', 't', 'e', 'n', 't', 's', ' ', '3', ' ', '0', ' ', 'R', '/', 'T', 'y', 'p', 'e', '/', 'P', 'a', 'g', 'e', '/', 'P', 'a', 'r', 'e', 'n', 't', ' ', '8', '7', ' ', '0', ' ', 'R', '/', 'T', 'h', 'u', 'm', 'b', ' ', '7', '1', ' ', '0', ' ', 'R', '/', 'R', 'o', 't', 'a', 't', 'e', ' ', '0', '/', 'M', 'e', 'd', 'i', 'a', 'B', 'o', 'x', '[', '0', ' ', '0', ' ', '6', '1', '2', ' ', '7', '9', '2', ']', '/', 'C', 'r', 'o', 'p', 'B', 'o', 'x', '[', '0', ' ', '0', ' ', '6', '1', '2', ' ', '7', '9', '2', ']', '/', 'R', 'e', 's', 'o', 'u', 'r', 'c', 'e', 's', ' ', '2', ' ', '0', ' ', 'R', '>', '>', '\\r', 'e', 'n', 'd', 'o', 'b', 'j', '\\r', '2', ' ', '0', ' ', 'o', 'b', 'j', '<', '<', '/', 'C', 'o', 'l', 'o', 'r', 'S', 'p', 'a', 'c', 'e', '<', '<', '/', 'D', 'e', 'f', 'a', 'u', 'l', 't', 'R', 'G', 'B', ' ', '1', '0', '0', ' ', '0', ' ', 'R', '>', '>' ['%','P','D','F',' - ','1','。','5','\\ r','%','\\ xe2','\\ xe3 ','\\ xcf','\\ xd3','\\ n','1','','0','','o','b','j','<','<' ,'/','C','o','n','t','e','n','t','s','','3','','0' ,'','R','/','T','y','p','e','/','P','a','g','e','/ ','P','a','r','e','n','t','','8','7','','0','','R' ,'/','T','h','u','m','b','','7','1','','0','','R', '/','R','o','t','a','t','e','','0','/','M','e','d' ,'我','a','B','o','x','[','0','','0','','6','1','2' ,'','7','9','2',']','/','C','r','o','p','B','o','x ','[','0','','0','','6','1','2','','7','9','2',']' ,'/','R','e','s','o','u','r','c','e','s','','2','' ,'0','','R','>','>','\\ r','e','n','d','o','b','j',' \\ r','2','','0','','o','b','j','<','<','/','C','o', 'l','o','r','S','p','a','c','e','<','<','/','D','e ','f','a','u','l','t','R','G','B','','1','0','0',' ','0','','R','>','>' , '/', 'F', 'o', 'n', 't', '<', '<', '/', 'F', '5', ' ', '9', '6', ' ', '0', ' ', 'R', '/', 'F', '7', ' ', '9', '7', ' ', '0', ' ', 'R', '/', 'F', '9', ' ', '1', '0', '6', ' ', '0', ' ', 'R', '/', 'F', '1', '1', ' ', '1', '0', '7', ' ', '0', ' ', 'R', '/', 'F', '1', '4', ' ', '1', '1', '1', ' ', '0', ' ', 'R', '/', 'F', '1', '6', ' ', '1', '1', '6', ' ', '0', ' ', 'R', '/', 'F', '1', '7', ' ', '1', '1', '7', ' ', '0', ' ', 'R', '/', 'F', '1', '3', ' ', '1', '1', '2', ' ', '0', ' ', 'R', '>', '>', '/', 'P', 'r', 'o', 'c', 'S', 'e', 't', '[', '/', 'P', 'D', 'F', '/', 'T', 'e', 'x', 't', ']', '>', '>', '\\r', 'e', 'n', 'd', 'o', 'b', 'j', '\\r', '3', ' ', '0', ' ', 'o', 'b', 'j', '<', '<', '/', 'L', 'e', 'n', 'g', 't', 'h', ' ', '4', ' ', '0', ' ', 'R', '/', 'F', 'i', 'l', 't', 'e', 'r', '/', 'F', 'l', 'a', 't', 'e', 'D', 'e', 'c', 'o', 'd', 'e', '>', '>', 's', 't', 'r', 'e', 'a', 'm', '\\n', 'H', '\\x89', '\\xa4', 'W', '\\xd9', 'r', 'T', '\\xc9', '\\x11', '\\xfd', '\\x82', '\\xfb', '\\x0f', '\\xf5', ,'/','F','o','n','t','<','<','/','F','5','','9','6 ','','0','','R','/','F','7','','9','7','','0','',' R','/','F','9','','1','0','6','','0','','R','/','F ','1','1','','1','0','7','','0','','R','/','F','1' ,'4','','1','1','1','','0','','R','/','F','1','6', '','1','1','6','','0','','R','/','F','1','7','','1 ','1','7','','0','','R','/','F','1','3','','1','1' ,'2','','0','','R','>','>','/','P','r','o','c','S' ,'e','t','[','/','P','D','F','/','T','e','x','t',' ]','>','>','\\ r','e','n','d','o','b','j','\\ r','3',' ','0','','o','b','j','<','<','/','L','e','n','g',' t','h','','4','','0','','R','/','F','i','l','t','e ','r','/','F','l','a','t','e','D','e','c','o','d', 'e','>','>','s','t','r','e','a','m','\\ n','H','\\ x89', '\\ xa4','W','\\ xd9','r','T','\\ xc9','\\ x11','\\ xfd','\\ x82','\\ xfb','\\ x0f ','\\ xf5', '\\xd8', '\\n', '\\x8f', '\\x8a', '\\xda', '\\x97', 'G', '!', '\\x04', '\\x06', '\\x03'] '\\ xd8','\\ n','\\ x8f','\\ x8a','\\ xda','\\ x97','G','!','\\ x04','\\ x06','\\ X03' ]

On a PDF I happen to have on my desktop (Its a IC Datasheet LTC1450 ) 在我的桌面上碰巧有PDF(它的IC数据表LTC1450

Using "rb" (Read Binary): 使用“rb”(读取二进制):

f = open("14500lf.pdf", "rb") f = open(“14500lf.pdf”,“rb”)
data = list(f.read()) data = list(f.read())
print data 打印数据
['%', 'P', 'D', 'F', '-', '1', '.', '5', '\\r', '%', '\\xe2', '\\xe3', '\\xcf', '\\xd3', '\\r', '\\n', '1', ' ', '0', ' ', 'o', 'b', 'j', '<', '<', '/', 'C', 'o', 'n', 't', 'e', 'n', 't', 's', ' ', '3', ' ', '0', ' ', 'R', '/', 'T', 'y', 'p', 'e', '/', 'P', 'a', 'g', 'e', '/', 'P', 'a', 'r', 'e', 'n', 't', ' ', '8', '7', ' ', '0', ' ', 'R', '/', 'T', 'h', 'u', 'm', 'b', ' ', '7', '1', ' ', '0', ' ', 'R', '/', 'R', 'o', 't', 'a', 't', 'e', ' ', '0', '/', 'M', 'e', 'd', 'i', 'a', 'B', 'o', 'x', '[', '0', ' ', '0', ' ', '6', '1', '2', ' ', '7', '9', '2', ']', '/', 'C', 'r', 'o', 'p', 'B', 'o', 'x', '[', '0', ' ', '0', ' ', '6', '1', '2', ' ', '7', '9', '2', ']', '/', 'R', 'e', 's', 'o', 'u', 'r', 'c', 'e', 's', ' ', '2', ' ', '0', ' ', 'R', '>', '>', '\\r', 'e', ['%','P','D','F',' - ','1','。','5','\\ r','%','\\ xe2','\\ xe3 ','\\ xcf','\\ xd3','\\ r','\\ n','1','','0','','o','b','j','< ','<','/','C','o','n','t','e','n','t','s','','3',' ','0','','R','/','T','y','p','e','/','P','a','g',' e','/','P','a','r','e','n','t','','8','7','','0',' ','R','/','T','h','u','m','b','','7','1','','0','' ,'R','/','R','o','t','a','t','e','','0','/','M','e ','d','我','a','B','o','x','[','0','','0','','6','1 ','2','','7','9','2',']','/','C','r','o','p','B',' o','x','[','0','','0','','6','1','2','','7','9','2 ',''','/','R','e','s','o','u','r','c','e','s','',' 2','','0','','R','>','>','\\ r','e',

....Snip a few thousand lines... ....剪几千行......

'9', '1', ' ', '0', ' ', 'R', '/', 'I', 'D', '[', '<', 'd', 'd', '3', 'd', '2', '8', '5', 'e', '1', 'd', '9', '0', '4', '6', 'e', '1', 'f', '6', 'e', '7', '0', '8', 'b', 'd', '8', 'e', '4', 'f', '9', 'b', '1', '3', '>', '<', '4', '3', '8', 'a', '7', '7', '2', '3', 'f', 'b', '2', '9', 'e', '7', '4', '6', 'a', '4', 'd', '4', '1', '6', 'a', 'f', '7', '6', '2', 'd', '8', '0', '9', '5', '>', ']', '>', '>', '\\r', '\\n', 's', 't', 'a', 'r', 't', 'x', 'r', 'e', 'f', '\\r', '\\n', '2', '9', '0', '2', '6', '9', '\\r', '\\n', '%', '%', 'E', 'O', 'F', '\\r', '\\n'] '9','1','','0','','R','/','我','D','[','<','d','d', '3','d','2','8','5','e','1','d','9','0','4','6','e ','1','f','6','e','7','0','8','b','d','8','e','4', 'f','9','b','1','3','>','<','4','3','8','a','7','7 ','2','3','f','b','2','9','e','7','4','6','a','4', 'd','4','1','6','a','f','7','6','2','d','8','0','9 ','5','>',']','>','>','\\ r','\\ n','s','t','a','r','t ','x','r','e','f','\\ r','\\ n','2','9','0','2','6','9 ','\\ r','\\ n','%','%','E','O','F','\\ r','\\ n']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM