在Dataframe float32列上使用list（zip（…））时出现浮动问题

Question

While trying to create a tuple column consisting of latitude and longitude coordinates from two seperate columns I stumpled upon zip as a pretty fast alternative to itertuples , list comprehensions, etc. It needs to be fast because I am dealing with roughly 4M rows and I don't want to waste my time on attribute creation. 尝试从两个单独的列创建一个由纬度和经度坐标组成的元组列时，我绊倒了zip ，这是itertuples ，列表itertuples等的非常快速的替代方法。它需要快速进行，因为我处理的是大约4M行，不想在创建属性上浪费时间。

The good thing is, my question perfectly asks itself by looking at the output of this Code: What is happening and how can this be prevented? 好消息是，我的问题通过查看此代码的输出来完美地问自己：正在发生什么，如何防止这种情况发生？ I am absolutely positive that eg 52.353500 is as precise as it gets and the Dataframe is not just cutting it of for view - because this already equals a (very rough) positional precision of 10 centimeters. 我绝对肯定，例如52.353500会尽可能精确，并且Dataframe不仅会将其剪切掉，因为它已经等于10厘米（非常粗糙）的位置精度。

print(df['lat'].head())
print(df['long'].head())
list(zip(df['lat'].head(), df['long'].head()))

Output: 输出：

14    52.353500
37    52.355511
42    52.354019
44    52.373829
83    52.354599
Name: lat, dtype: float32

14    5.00611
37    4.90732
42    4.92045
44    4.84816
83    4.89405
Name: long, dtype: float32

[(52.35350036621094, 5.006110191345215),
 (52.35551071166992, 4.907320022583008),
 (52.35401916503906, 4.920450210571289),
 (52.37382888793945, 4.8481597900390625),
 (52.35459899902344, 4.894050121307373)]

As requested: The Dataframe was loaded using read_csv with dtype float32 for both columns. 根据要求：使用read_csv和read_csv float32加载两列的数据read_csv 。

Solution: It was a mixture of me not knowing the limitations of Series representation of floats, not using float_precision when reading the data in and using float32 in combination with float_precision . 解决方案：我不知道float的Series表示的局限性，是混合的，在读取数据时不使用float_precision ，并且将float32与float_precision结合使用。 Kids, use float dtype and let Pandas decide (to use float64 ). 孩子们，使用float dtype并让Pandas决定（使用float64 ）。

Answer 1

This is perfectly well defined behaviour, pandas is truncating the trailing digits based on the preset precision: 这是定义明确的行为，熊猫根据预设的精度截断了尾随数字：

import math  

math.pi  
# 3.141592653589793

pi has 15 digits of precision here. pi在这里具有15位精度。 However, in a Series, it does not show as being so: 但是，在系列中，事实并非如此：

pd.Series([math.pi])                                                                                                   

0    3.141593
dtype: float64

pd.Series([math.pi]) .tolist()                                                                                         
# [3.141592653589793]

This is because, 这是因为，

pd.get_option('precision')                                                                                             
# 6

Read more about Options and Settings and how you can change them. 阅读有关选项和设置以及如何更改它们的更多信息。

If you want to actually round your floats to a certain precision, use round : 如果您想将浮点数实际舍入到一定精度，请使用round ：

pd.Series([math.pi]).round(decimals=6).tolist()                                                                        
# [3.141593]

在Dataframe float32列上使用list（zip（…））时出现浮动问题

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-06-04 13:47:41

在Dataframe float32列上使用list（zip（…））时出现浮动问题

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-06-04 13:47:41

解决方案1
2 已采纳 2019-06-04 13:47:41