简体   繁体   English

将元组列表(int、float)写入流而不转换为字符串

[英]Write List of Tuples (int, float) to Stream Without Converting to String

I have a list in Python that consists of tuples that have the following format: (int, float).我在 Python 中有一个列表,它由具有以下格式的元组组成:(int, float)。 I want to write this list to a io byte or io raw stream without having to convert the ints and/or floats to a string.我想将此列表写入 io 字节或 io 原始流,而不必将整数和/或浮点数转换为字符串。 How can I do this?我怎样才能做到这一点? Thanks.谢谢。

There are many formats which can be used to serialize Python objects into bytes.有许多格式可用于将 Python 对象序列化为字节。 There are pros and cons for each of them.它们各有利弊。

If the data has only a list of tuples of integers and flaots, that make the job rather simple.如果数据只有一个整数元组和浮点数的列表,那会使工作变得相当简单。

Let's assume, this is the data:让我们假设,这是数据:

data = 100 * [(1, 1.111), (18, 1.234), (555555, 0.001), (-1, 1e70)]

Which of them falls into the category of "strings" is not clear to me.我不清楚其中哪些属于“字符串”类别。 The most obvious "string" format would be str(data) .最明显的“字符串”格式是str(data) How big is it?它有多大?

>>> len(str(data))
5500

This takes up 5500 bytes.这将占用 5500 字节。 The question asks for something more compressed.这个问题要求更压缩的东西。 So, we're looking for something much shorter than 5500 bytes.所以,我们正在寻找比 5500 字节短得多的东西。

JSON is a very popular format (it is also a string). JSON是一种非常流行的格式(它也是一个字符串)。 How big is it?它有多大?

>>> len(json.dumps(data))
5500

This has the same size (5500 bytes), but at least it is well defined.这具有相同的大小(5500 字节),但至少它是明确定义的。 Can it be smaller?可以更小吗? How about a BZipped JSON ? BZipped JSON怎么样?

>>> len(bz2.compress(json.dumps(data).encode('utf-8')))
131

That is much better!那好多了!

This was probably very good because of a repeating pattern.由于重复模式,这可能非常好。 Is there a format which does not use zipping?有没有不使用压缩的格式? Maybe pickle ?也许泡菜

>>> len(pickle.dumps(data))
862

Not as good as zip (of course!), but still good.不如 zip(当然!),但仍然不错。

Could we make a BZipped pickle ?我们可以做一个BZipped 泡菜吗?

>>> len(bz2.compress(pickle.dumps(data)))
155

Better, but there is no reason for it to be better than BZipped JSON.更好,但没有理由比 BZipped JSON 更好。

How about some other format?其他格式怎么样? You could convert each tuple to the equivalent of this C structure , using the struct module:您可以使用struct模块将每个元组转换为该 C结构的等效项:

struct {
    int i;
    double f;
};

However, then you'd have to know how big the int can be.但是,那么您必须知道 int 可以有多大。 Python int can be as big aas you want, but if you eg know that all numbers are between 0 and 255, you just need one byte. Python int 可以是你想要的大小,但如果你知道所有的数字都在 0 到 255 之间,你只需要一个字节。 For the float, you need 64 bits (ie 8 bytes), or you lose precision.对于浮点数,您需要 64 位(即 8 个字节),否则将失去精度。 So, this will go up to about 1000 bytes.因此,这将增加到大约 1000 个字节。 Not very good.不是很好。

There are also other built-in options documented in Python's documentation on Persistence . Python 的 Persistence文档中还记录了其他内置选项。

You can also invent your own format.您还可以发明自己的格式。

In the end, you have to decide what suits you best.最后,您必须决定什么最适合您。

You can dump integers and floats into bytes directly really easily using the struct module .您可以使用struct 模块轻松地将整数和浮点数直接转储为字节。

>>> import struct
>>> data = [(2, 1.0), (3, 2.0), (25, 55.5)]
>>> for tup in data:
    bytes_data = struct.pack("<ld", *tup)
    print(bytes_data)


b'\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?'
b'\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@'
b'\x19\x00\x00\x00\x00\x00\x00\x00\x00\xc0K@'

As an aside the string I use as the first argument to the pack function is a format identifier that tells you what type and size of each number, in this case l is a long signed int, d is a float double.顺便说一句,我用作pack函数的第一个参数的字符串是一个格式标识符,它告诉您每个数字的类型和大小,在这种情况下, l是一个长符号整数, d是一个浮点双精度数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM