简体   繁体   English

浮点精度因 pandas dataframe 中的元素而异

[英]Float precision differs between elements in pandas dataframe

I am trying to read a dataframe from a csv, do some calculations with it and then export the results to another csv.我正在尝试从 csv 中读取 dataframe,对其进行一些计算,然后将结果导出到另一个 csv。 While doing that I noticed that the value 8.1e-202 is getting changed to 8.1000000000000005e-202 .这样做时,我注意到值 8.1e- 8.1e-202正在更改为8.1000000000000005e-202 But all the other numbers are represented correctly.但是所有其他数字都正确表示。

Example:例子:

A example.csv looks like this:一个例子。csv 看起来像这样:

id,e-value
ID1,1e-20
ID2,8.1e-202
ID3,9.24e-203

If I do:如果我做:

import pandas as pd
df = pd.read_csv("example.csv")
df.iloc[1]["e-value"]
>>> 8.1000000000000005e-202

df.iloc[2]["e-value"]
>>> 9.24e-203

Why is 8.1e-202 being altered but 9.24e-203 isn't?为什么 8.1e 8.1e-202被更改而9.24e-203没有被更改?

I tried to change the datatype that pandas is using from the default我试图从默认更改 pandas 使用的数据类型

df["e-value"].dtype
>>> dtype('float64')

to numpy datatypes like this: numpy 数据类型如下:

import numpy as np
df = pd.read_csv("./temp/test", dtype={"e-value" : np.longdouble})

but this will just result in:但这只会导致:

df.iloc[1]["e-value"]
>>> 8.100000000000000522e-202

Can someone explain to me why this is happening?有人可以向我解释为什么会这样吗? I can't replicate this problem with any other number.我无法用任何其他号码复制这个问题。 Everything bigger or smaller than 8.1e-202 seems to work normally.任何大于或小于 8.1e-202 的东西似乎都可以正常工作。

Float number representation is something not so simple.浮点数表示并不是那么简单。 Not every real number can be represented and almost all (relatively speaking) are actually approximations.并非每个实数都可以表示,几乎所有(相对而言)实际上都是近似值。 Is not like integers, the precision varies and python has a precision undefined float really.不像整数,精度不同,python 确实有一个精度未定义的浮点数。

Each floating point standar will have their own set of real numbers that can represent exactly.每个浮点标准都有自己的一组可以精确表示的实数。 There's no work around.没有解决办法。

https://en.wikipedia.org/wiki/Single-precision_floating-point_format https://en.wikipedia.org/wiki/IEEE_754-2008_revision https://en.wikipedia.org/wiki/Single-precision_floating-point_format https://en.wikipedia.org/wiki/IEEE_754-2008_revision

If the problem really is the arithmetic or comparison, you should consider if error will grow or decrease.如果问题确实是算术或比较,您应该考虑误差是否会增加或减少。 For example multiplying by large numbers can grow the representation error.例如,乘以大数会增加表示错误。

And also, when comparing you should do things like math.is_close .而且,在比较时,你应该做类似math.is_close的事情。 Basically comparing the distance between the numbers.基本上比较数字之间的距离。

If you are trying to represent and operate real numbers, that aren't irrational numbers.如果您尝试表示和操作实数,那不是无理数。 Like integers, fractions or decimal numbers with finite digits, you can also consider cast to the proper digit representation like: int, decimal or fraction.与具有有限位的整数、分数或小数一样,您也可以考虑强制转换为正确的数字表示形式,例如:int、decimal 或 fraction。

See this for further ideas: https://davidamos.dev/the-right-way-to-compare-floats-in-python/#:~:text=How%20To%20Compare%20Floats%20in%20Python&text=If%20abs(a%20%2D%20b),rel_tol%20keyword%20argument%20of%20math .有关更多想法,请参阅此内容: https://davidamos.dev/the-right-way-to-compare-floats-in-python/#:~:text=How%20To%20Compare%20Floats%20in%20Python&text=If% 20abs(a%20%2D%20b),rel_tol%20keyword%20argument%20of%20math

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM