This scenario is a simplification of an ETL scenario involving multiple sets of data pulled from MySQL tables. I have a merged dataframe where one price column is type float64
and the other is type object
.
import pandas as pd
df = pd.DataFrame({
'price1': [0.066055],
'price2': ['0.066055'],
})
>>> df.dtypes
price1 float64
price2 object
dtype: object
When these two columns are converted to float64
, the column price1
is rounded incorrectly when rounded to 5 digits.
float64_df = df[price_cols].apply(lambda x: pd.to_numeric(x))
>>> float64_df.dtypes
price1 float64
price2 float64
dtype: object
>>> float64_df[price_cols].apply(lambda x: x.round(5))
price1 price2
0 0.06606 0.06605
However, when the columns are converted to float32
using downcast='float'
, the rounding works as expected.
float32_df = df[price_cols].apply(lambda x: pd.to_numeric(x, downcast='float'))
>>> float32_df.dtypes
price1 float32
price2 float32
dtype: object
>>> float32_df[price_cols].apply(lambda x: x.round(5))
price1 price2
0 0.06606 0.06606
Any ideas why the rounding doesn't work properly when both columns are of type float64
?
Printing the floats with higher precision shows that pd.to_numeric
converted '.066055'
to 0.06605499999999998872
.
with pd.option_context('display.float_format', '{:0.20f}'.format):
print(float64_df)
Output:
price1 price2
0 0.06605500000000000260 0.06605499999999998872
The short answer is pd.to_numeric
outputs different values for the two:
pd.to_numeric(0.066055)
pd.to_numeric('0.066055')
# 0.066055
# 0.06605499999999999
In the case of 0.066055
, it simply returns the value .
In the case of '0.066055'
, I believe it uses this function for converting the string to a float.
This answer may also be helpful.
Getting exact numbers with floats is somewhat impossible and floats are always somewhat unpredictable. My guess is that the object results in a float64 a little bit smaller than the original number eg 0.066054999999999999 or something similar, resulting in the unexpected rounding result.
Python has some documentation about this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.