pandas to_numeric(..., downcast='float') 失去精度

Question

Downcasting pandas dataframe (by columns) from float64 to float32 results in losing precision even though largest(9.761140e+02) and smallest (0.000000e+00) element is suitable for float32.将 pandas dataframe（按列）从 float64 向下转换为 float32 会导致精度下降，即使最大（9.761140e+02）和最小（0.000000e+00）元素适用于 float32。

Dataset is pretty large, 55 million rows times 12 columns.数据集非常大，5500 万行乘以 12 列。 This is the mean of the particular column without downcasting (1.343987e+00) and after is this 1.224472e+00.这是没有向下转换的特定列的平均值 (1.343987e+00)，之后是 1.224472e+00。

Same results I am getting with np.astype() .我用np.astype()得到的结果相同。

Answer 1

This was a pretty interesting question.这是一个非常有趣的问题。 I tested several dataframes starting from 1 million records to 55 million, the same size as yours, keeping min , max value similar to the ones you have.我测试了几个数据帧，从 100 万条记录到 5500 万条记录，大小与您的相同，保持min ， max值与您所拥有的相似。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x, y = [], []
for idx, num in enumerate(range(1, 57, 2)):
    print(f"{idx+1}) Testing with {num} million records...")
    rows = num*(10**6)
    cols = ['col']

    df = pd.DataFrame(np.random.uniform(0, 9.761140e+02, size=(rows, len(cols))), columns=cols)
    df['col1'] = pd.to_numeric(df['col'], downcast='float')
    df['diff'] = df['col'] - df['col1']

    diff = df['col'].mean() - df['col1'].mean()

    x.append(num)
    y.append(diff)

plt.plot(x, y, 'ro')
plt.xlabel('number of rows (millions)')
plt.ylabel('precision value lost')
plt.show()

Here's the plot.这是 plot。

Based on the plot, it seems like, after 35 million records, there is a sudden increase in loss of precision and appears to be logarithmic in nature.基于 plot，似乎在 3500 万条记录之后，精度损失突然增加，并且本质上似乎是对数的。 I haven't figured out yet why it is the way it is.我还没弄清楚为什么会这样。

pandas to_numeric(..., downcast='float') 失去精度

问题描述

1 个解决方案

解决方案1
4 2020-04-20 00:03:55

pandas to_numeric(..., downcast='float') 失去精度

问题描述

1 个解决方案

解决方案1 4 2020-04-20 00:03:55

解决方案1
4 2020-04-20 00:03:55