简体   繁体   English

在Dataframe float32列上使用list(zip(…))时出现浮动问题

[英]Float issue when using list(zip(…)) on Dataframe float32 columns

While trying to create a tuple column consisting of latitude and longitude coordinates from two seperate columns I stumpled upon zip as a pretty fast alternative to itertuples , list comprehensions, etc. It needs to be fast because I am dealing with roughly 4M rows and I don't want to waste my time on attribute creation. 尝试从两个单独的列创建一个由纬度和经度坐标组成的元组列时,我绊倒了zip ,这是itertuples ,列表itertuples等的非常快速的替代方法。它需要快速进行,因为我处理的是大约4M行,不想在创建属性上浪费时间。

The good thing is, my question perfectly asks itself by looking at the output of this Code: What is happening and how can this be prevented? 好消息是,我的问题通过查看此代码的输出来完美地问自己:正在发生什么,如何防止这种情况发生? I am absolutely positive that eg 52.353500 is as precise as it gets and the Dataframe is not just cutting it of for view - because this already equals a (very rough) positional precision of 10 centimeters. 我绝对肯定,例如52.353500会尽可能精确,并且Dataframe不仅会将其剪切掉,因为它已经等于10厘米(非常粗糙)的位置精度。

print(df['lat'].head())
print(df['long'].head())
list(zip(df['lat'].head(), df['long'].head()))

Output: 输出:

14    52.353500
37    52.355511
42    52.354019
44    52.373829
83    52.354599
Name: lat, dtype: float32

14    5.00611
37    4.90732
42    4.92045
44    4.84816
83    4.89405
Name: long, dtype: float32

[(52.35350036621094, 5.006110191345215),
 (52.35551071166992, 4.907320022583008),
 (52.35401916503906, 4.920450210571289),
 (52.37382888793945, 4.8481597900390625),
 (52.35459899902344, 4.894050121307373)]

As requested: The Dataframe was loaded using read_csv with dtype float32 for both columns. 根据要求:使用read_csvread_csv float32加载两列的数据read_csv

Solution: It was a mixture of me not knowing the limitations of Series representation of floats, not using float_precision when reading the data in and using float32 in combination with float_precision . 解决方案:我不知道float的Series表示的局限性,是混合的,在读取数据时不使用float_precision ,并且将float32float_precision结合使用。 Kids, use float dtype and let Pandas decide (to use float64 ). 孩子们,使用float dtype并让Pandas决定(使用float64 )。

This is perfectly well defined behaviour, pandas is truncating the trailing digits based on the preset precision: 这是定义明确的行为,熊猫根据预设的精度截断了尾随数字:

import math  

math.pi  
# 3.141592653589793

pi has 15 digits of precision here. pi在这里具有15位精度。 However, in a Series, it does not show as being so: 但是,在系列中,事实并非如此:

pd.Series([math.pi])                                                                                                   

0    3.141593
dtype: float64

pd.Series([math.pi]) .tolist()                                                                                         
# [3.141592653589793]

This is because, 这是因为,

pd.get_option('precision')                                                                                             
# 6

Read more about Options and Settings and how you can change them. 阅读有关选项和设置以及如何更改它们的更多信息。

If you want to actually round your floats to a certain precision, use round : 如果您想将浮点数实际舍入到一定精度,请使用round

pd.Series([math.pi]).round(decimals=6).tolist()                                                                        
# [3.141593]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM