简体   繁体   English

在 pandas.to_numeric 中向下转换为 float16

[英]Downcast to float16 in pandas.to_numeric

I was wondering why the pd.to_numeric method cannot downcast to np.float16 .我想知道为什么pd.to_numeric方法不能向下转换为np.float16 The code says:代码说:

# pandas support goes only to np.float32,
# as float dtypes smaller than that are
# extremely rare and not well supported

Link to code: https://github.com/pandas-dev/pandas/blob/baa77c33fb71c29acea21ba06adaf426ed4cb561/pandas/core/tools/numeric.py#L164代码链接: https : //github.com/pandas-dev/pandas/blob/baa77c33fb71c29acea21ba06adaf426ed4cb561/pandas/core/tools/numeric.py#L164

Extremely rare?极其稀有? I have a lot of DataFrames with values that perfectly fit into a np.float16 array.我有很多 DataFrame 的值完全适合np.float16数组。 Not well supported?没有很好的支持? Can you give more details?你能提供更多细节吗?

Thanks!!谢谢!!

A lot of data can fit in the np.float16 type as you precise but the problem usually comes when you are using these numbers for computations.很多数据可以按照您的精确度放入np.float16类型中,但是当您使用这些数字进行计算时通常会出现问题。 As unutbu said:正如 unutbu 所说:

Arithmetic errors accumulate quite quickly with float16s: np.array([0.1,0.2], dtype='float16').sum() equals (approximately) 0.2998.使用 float16s 时,算术错误累积得非常快: np.array([0.1,0.2], dtype='float16').sum() 等于(大约)0.2998。 Especially when computations require thousands of arithmetic operations, this can be an unacceptable amount of error for many applications.尤其是当计算需要数千次算术运算时,对于许多应用程序来说,这可能是不可接受的错误量。

You can find all sorts of issues relating to data sizes even ignoring the errors.即使忽略错误,您也可以找到与数据大小相关的各种问题。 There are nan and infinity issues that creep up when using extremely small and extremely large values that np.float16 just cannot physically handle.当使用np.float16无法物理处理的极小和极大值时,会出现naninfinity问题。 This limits its usefulness to specific calculations that are less likely to be used in real world scenarios.这限制了它对不太可能在现实世界场景中使用的特定计算的有用性。 The maximum representable value is literally only 65504. And with the greater storage and processing power that we have available with our computers, there is really no need to constrain ourselves to this anymore.可表示的最大值实际上只有 65504。而且随着我们的计算机可用的更大存储和处理能力,我们真的没有必要再限制自己了。 And so pandas people just decided not to allow this for pd.to_numeric as there really aren't many uses for it while there are many downsides.所以熊猫人只是决定不允许pd.to_numeric使用它,因为它确实没有太多用途,但有很多缺点。 A lot of software also do not even support this type as it is not commonly used and that is what they mean by not well supported.许多软件甚至不支持这种类型,因为它不常用,这就是它们不被很好支持的意思。

Of course, you can conjure up an extremely large number of examples that can fit in the np.float16 format but there are also many, many more that can't.当然,您np.float16出大量可以适合np.float16格式的示例,但也有很多无法使用的示例。 This is not to say that there aren't applications where smaller number sizes are better.这并不是说没有应用程序的数字大小越小越好。 But those applications will probably not be using pandas.但是这些应用程序可能不会使用熊猫。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM