简体   繁体   中英

Downcast to float16 in pandas.to_numeric

I was wondering why the pd.to_numeric method cannot downcast to np.float16 . The code says:

# pandas support goes only to np.float32,
# as float dtypes smaller than that are
# extremely rare and not well supported

Link to code: https://github.com/pandas-dev/pandas/blob/baa77c33fb71c29acea21ba06adaf426ed4cb561/pandas/core/tools/numeric.py#L164

Extremely rare? I have a lot of DataFrames with values that perfectly fit into a np.float16 array. Not well supported? Can you give more details?

Thanks!!

A lot of data can fit in the np.float16 type as you precise but the problem usually comes when you are using these numbers for computations. As unutbu said:

Arithmetic errors accumulate quite quickly with float16s: np.array([0.1,0.2], dtype='float16').sum() equals (approximately) 0.2998. Especially when computations require thousands of arithmetic operations, this can be an unacceptable amount of error for many applications.

You can find all sorts of issues relating to data sizes even ignoring the errors. There are nan and infinity issues that creep up when using extremely small and extremely large values that np.float16 just cannot physically handle. This limits its usefulness to specific calculations that are less likely to be used in real world scenarios. The maximum representable value is literally only 65504. And with the greater storage and processing power that we have available with our computers, there is really no need to constrain ourselves to this anymore. And so pandas people just decided not to allow this for pd.to_numeric as there really aren't many uses for it while there are many downsides. A lot of software also do not even support this type as it is not commonly used and that is what they mean by not well supported.

Of course, you can conjure up an extremely large number of examples that can fit in the np.float16 format but there are also many, many more that can't. This is not to say that there aren't applications where smaller number sizes are better. But those applications will probably not be using pandas.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM