简体   繁体   English

如何在Python中将浮点列表输出到二进制文件

[英]How to output list of floats to a binary file in Python

I have a list of floating-point values in Python: 我在Python中有一个浮点值列表:

floats = [3.14, 2.7, 0.0, -1.0, 1.1]

I would like to write these values out to a binary file using IEEE 32-bit encoding. 我想使用IEEE 32位编码将这些值写入二进制文件。 What is the best way to do this in Python? 在Python中执行此操作的最佳方法是什么? My list actually contains about 200 MB of data, so something "not too slow" would be best. 我的列表实际上包含大约200 MB的数据,所以“不太慢”的东西是最好的。

Since there are 5 values, I just want a 20-byte file as output. 由于有5个值,我只想要一个20字节的文件作为输出。

Alex is absolutely right, it's more efficient to do it this way: 亚历克斯是绝对正确的,这样做效率更高:

from array import array
output_file = open('file', 'wb')
float_array = array('d', [3.14, 2.7, 0.0, -1.0, 1.1])
float_array.tofile(output_file)
output_file.close()

And then read the array like that: 然后像这样读取数组:

input_file = open('file', 'rb')
float_array = array('d')
float_array.fromstring(input_file.read())

array.array objects also have a .fromfile method which can be used for reading the file, if you know the count of items in advance (eg from the file size, or some other mechanism) 如果你事先知道项目的数量(例如,从文件大小或其他一些机制), array.array对象也有一个.fromfile方法可用于读取文件

See: Python's struct module 请参阅: Python的struct模块

import struct
s = struct.pack('f'*len(floats), *floats)
f = open('file','wb')
f.write(s)
f.close()

The array module in the standard library may be more suitable for this task than the struct module which everybody is suggesting. 标准库中的数组模块可能比每个人都建议的struct模块更适合此任务。 Performance with 200 MB of data should be substantially better with array. 用200 MB的数据性能应与阵列基本上更好。

If you'd like to take at a variety of options, try profiling on your system with something like this 如果您想采用各种选项,请尝试使用类似的方法对系统进行分析

I'm not sure how NumPy will compare performance-wise for your application, but it may be worth investigating. 我不确定NumPy会如何比较你的应用程序的性能,但它可能值得研究。

Using NumPy : 使用NumPy

from numpy import array
a = array(floats,'float32')
output_file = open('file', 'wb')
a.tofile(output_file)
output_file.close()

results in a 20 byte file as well. 导致20字节的文件。

struct.pack() looks like what you need. struct.pack()看起来像你需要的。

http://docs.python.org/library/struct.html http://docs.python.org/library/struct.html

I ran into a similar issue while inadvertently writing a 100+ GB csv file. 我在无意中编写了一个100多GB的csv文件时遇到了类似的问题。 The answers here were extremely helpful but, to get to the bottom of it, I profiled all of the solutions mentioned and then some . 这里的答案非常有用,但是,为了了解它的底部, 我描述了所有提到的解决方案,然后是一些 All profiling runs were done on a 2014 Macbook Pro with a SSD using python 2.7. 所有分析运行都是在使用python 2.7的2014 Macbook Pro上完成的。 From what I'm seeing, the struct approach is definitely the fastest from a performance point of view: 从我所看到的,从性能的角度来看, struct方法绝对是最快的:

6.465 seconds print_approach    print list of floats
4.621 seconds csv_approach      write csv file
4.819 seconds csvgz_approach    compress csv output using gzip
0.374 seconds array_approach    array.array.tofile
0.238 seconds numpy_approach    numpy.array.tofile
0.178 seconds struct_approach   struct.pack method

My "answer" is really a comment on the various answers. 我的“回答”实际上是对各种答案的评论。 I can't comment since I don't have 50 reputation. 我不能评论,因为我没有50的声誉。

If the file is to be read back by Python, then use the "pickle" module. 如果要通过Python回读文件,则使用“pickle”模块。 This one tool can read and write many things in binary. 这个工具可以读取和写入二进制的许多东西。

But the way the question is asked, saying "IEEE 32-bit encoding", it sounded like the file will be read back in other languages. 但问题的方式是,提到“IEEE 32位编码”,听起来好像该文件将以其他语言回读。 In that case the byte order should be specified. 在这种情况下,应指定字节顺序。 The problem is that most machines are x86, with little-endian byte order, but the number one data processing language is Java/JVM, using the big-endian byte order. 问题是大多数机器都是x86,具有little-endian字节顺序,但是头号数据处理语言是Java / JVM,使用big-endian字节顺序。 So the Python "tofile()" will use C, which uses little endian due to the machine being little-endian, and then the data processing code on Java/JVM will decode using big endian, leading to error. 因此Python“tofile()”将使用C,由于机器是little-endian而使用小端,然后Java / JVM上的数据处理代码将使用big endian解码,从而导致错误。

To work with JVM: 要使用JVM:

# convert to bytes, BIG endian, for use by Java
import struct
f = [3.14, 2.7, 0.0, -1.0, 1.1]
b = struct.pack('>'+'f'*len(f), *f)

with open("f.bin", "wb") as file:
    file.write(b)

On the Java side: 在Java方面:

try(var stream = new DataInputStream(new FileInputStream("f.bin")))
{
    for(int i = 0; i < 5; i++)
        System.out.println(stream.readFloat());
}
catch(Exception ex) {}

The problem now is the Python 'f'*len(f) code - hopefully the interpreter doesn't actually create a super long "ffffff..." string. 现在的问题是Python'f 'f'*len(f)代码 - 希望解释器实际上不会创建一个超长的“ffffff ...”字符串。

I would use numpy array and byteswap 我会使用numpy数组和byteswap

import numpy, sys
f = numpy.array([3.14, 2.7, 0.0, -1.0, 1.1], dtype=numpy.float32)

if sys.byteorder == "little":
    f.byteswap().tofile("f.bin") # using BIG endian, for use by Java
else:
    f.tofile("f.bin")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM