我如何使用 pandas dataframe to_dict 与 float32 没有额外的浮点数

Question

I'd like to use dtype='float32' (it is probably a numpy dtype => np.float32 ) instead of dtype='float64' to reduce memory usage of my pandas dataframe, because I have to handle hugh pandas dataframes.我想使用dtype='float32' （它可能是 numpy dtype => np.float32 ）而不是 dtype dtype='float64'来减少 memory 对我的 pandas dataframe 的使用，因为我必须处理 hugh38 dataframe 8851063.

At one point, I'd like to extract a python list with '.to_dict(orient='records')' in order to get a dictionary for each row.有一次，我想用'.to_dict(orient='records')'提取一个 python 列表，以便为每一行获取一个字典。

In this case, I will get additional decimal places, which are probably based on s.th like this:在这种情况下，我会得到额外的小数位，这可能是基于 s.th 这样的：

Is floating point math broken?浮点数学坏了吗？

How can I cast the date / change the type etc. in order to get the same result, as I get with float64 (see example snippets)?如何转换日期/更改类型等以获得与float64相同的结果（参见示例片段）？

import pandas as pd

_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}

_test = pd.DataFrame(_data).astype(dtype='float64')

print(f"{_test=}")
print(f"{_test.round(1)=}")
print(f"{_test.to_dict(orient='records')=}")
print(f"{_test.round(1).to_dict(orient='records')=}")

float64 output: float64 output：


_test=      col1  col2
0  1.45123   0.1
1  1.64123   0.2
_test.round(1)=   col1  col2
0   1.5   0.1
1   1.6   0.2
_test.to_dict(orient='records')=[{'col1': 1.45123, 'col2': 0.1}, {'col1': 1.64123, 'col2': 0.2}]
_test.round(1).to_dict(orient='records')=[{'col1': 1.5, 'col2': 0.1}, {'col1': 1.6, 'col2': 0.2}]

import pandas as pd

_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}

_test = pd.DataFrame(_data).astype(dtype='float32')

print(f"{_test=}")
print(f"{_test.round(1)=}")
print(f"{_test.to_dict(orient='records')=}")
print(f"{_test.round(1).to_dict(orient='records')=}")

float32 output:浮动 32 float32 ：

_test=      col1  col2
0  1.45123   0.1
1  1.64123   0.2
_test.round(1)=   col1  col2
0   1.5   0.1
1   1.6   0.2
_test.to_dict(orient='records')=[{'col1': 1.4512300491333008, 'col2': 0.10000000149011612}, {'col1': 1.6412299871444702, 'col2': 0.20000000298023224}]
_test.round(1).to_dict(orient='records')=[{'col1': 1.5, 'col2': 0.10000000149011612}, {'col1': 1.600000023841858, 'col2': 0.20000000298023224}]

Answer 1

Managing float representation has some limitation for example this管理浮动表示有一些限制，例如这个

Using to_dict() function switch from numpy representation to python native float representation, this means a sort of translation.使用 to_dict() function 从 numpy 表示切换到 python 本机浮点表示，这意味着一种翻译。 Nevertheless the precision you are using, some small pieces of information will be lost.不管你使用的精度如何，一些小的信息都会丢失。

For a no-lossy convertion you must cast your number to string before the to_dict() using the as_type() function:对于无损转换，您必须使用as_type() function 在 to_dict()之前将数字转换为字符串：

_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}
_test = pd.DataFrame(_data).astype(dtype='float32')
_test.round(1).astype('str').to_dict(orient='records')

_test.round(1).astype('str').to_dict(orient='records')=[{'col1': '1.5', 'col2': '0.1'}, {'col1': '1.6', 'col2': '0.2'}]

An alternative can be the decimal format.一种替代方法可以是十进制格式。

我如何使用 pandas dataframe to_dict 与 float32 没有额外的浮点数

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-20 17:36:12

我如何使用 pandas dataframe to_dict 与 float32 没有额外的浮点数

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-20 17:36:12

解决方案1
1 已采纳 2022-05-20 17:36:12